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Description 

[0001] This invention relates generally to the field of speech communication and, more particularly, to discontinuous 
transmission (DTX) and to improving the quality of comfort noise (CN) during discontinuous transmission. 

s [0002] Discontinuous transmission is used in mobile communication systems to switch the radio transmitter off during 
speech pauses. The use of DTX saves power in the mobile station and increases the time required between battery 
recharging. It also reduces the general interference level and thus improves transmission quality. 
[0003] However, during speech pauses the background noise which is transmitted with the speech also disappears 
if the channel is cut off completely. The result is an unnatural sounding audio signal (silence) at the receiving end of 

10 the communication. 

[0004] It is known in the art, instead of completely switching the transmission off during speech pauses, to generate 
parameters that characterize the background noise, and to send these parameters over the air interface at a low rate 
in Silence Descriptor (SID) frames. These parameters are used at the receive side to regenerate background noise 
which reflects, as well as possible, the spectral and temporal content of the background noise at the transmit side. 
15 These parameters that characterize the background noise are referred to as comfort noise (CN) parameters. The 
comfort noise parameters typically include a subset of speech coding parameters: in particular synthesis filter coeffi- 
cients and gain parameters. 

[0005] It should be noted, however, that in some comfort noise evaluation schemes of some speech codecs, part of 
the comfort noise parameters are derived from speech coding parameters while other comfort noise parameters) are 
20 derived from, for example, signals that are available in the speech coder but that are not transmitted over the air 
interface. 

[0006] It is assumed in prior-art DTX systems that the excitation can be approximated sufficiently well by spectrally 
flat noise (i.e., white noise). In prior art DTX systems, the comfort noise is generated by feeding locally generated, 
spectrally flat noise through a speech coder synthesis filter. However, such white noise sequences are unable to pro- 

25 duce high quality comfort noise. This is because the optimal excitation sequences are not spectrally flat, but may have 
spectral tilt or even a stronger deviation from flat spectral characteristics. Depending on the type of background noise, 
the spectra of the optimal excitation sequences may, for example, have lowpass or highpass characteristics. Because 
of this mismatch between the random excitation and the correct or optimal excitation the comfort noise generated at 
the receive side sounds different from the background noise on the transmit side. The generated comfort noise may, 

30 for example, sound considerably "brighter" or "darker" than it should be. During DTX, the spectral content of the back- 
ground noise thus changes between active speech (i.e., speech coding on) and speech pauses (i.e., comfort noise 
generation on). This audible difference in the comfort noise thus causes a reduction in the transmission quality which 
can be perceived by a user. 

[0007] In speech coding systems, such as in the full rate (FR), half rate (HR), and enhanced full rate (EFR) speech 

35 channels of the GSM system, the comfort noise parameters are transmitted at a low rate. By example, in the FR and 
EFR channels this rate is only once per every 24 frames (i.e., every 480 milliseconds). This means that comfort noise 
parameters are updated only about twice per second. This low transmission rate cannot accurately represent the 
spectral and temporal characteristics of the background noise and, therefore, some degradation in the quality of back- 
ground noise is unavoidable during DTX. 

40 [0008] A further problem that arises during DTX in digital cellular systems, such as GSM, relates to a hangover period 
of a few speech frames that is introduced after a speech burst, and before the actual transmission Is terminated. If the 
speech burst is below some threshold duration, it can be interpreted as a background noise spike, and in this case the 
speech burst is not followed by a hangover period. The hangover period is used for computing an estimate of the 
characteristics of the background noise on the transmit side to be transmitted to the receive side in a comfort noise 

45 parameter message (or Silence Descriptor (SID) frame), before the transmission is terminated. As was described 
above, the transmitted background noise estimate Is used on the receive side to generate comfort noise with charac- 
teristics similar to the transmit side background noise at the time the transmission is terminated. 
[0009] in known types of DTX mechanisms similar to those of GSM FR and HR, non-predictive comfort noise quan- 
tization schemes are employed. Due to this, the receive side does not have to know if a hangover period exists at the 

so end of a speech burst. However, in GSM EFR, efficient predictive comfort noise quantization schemes are employed, 
and the existence of a hangover period is locally evaluated at the receive side to assist in comfort noise dequantization. 
This involves a small computational load and a number of program instructions to be executed. 
[0010] Another problem arises if the background noise on the transmit side is not stationary but varies considerably. 
In this case there may exist a single frame or a small number of frames within an averaging period for which some or 

55 all of the speech coding parameters provide a poor characterization of the typical background noise. A similar situation 
may occur when a Voice Activity Detection or VAD algorithm interprets the unvoiced end of the period of active speech 
as "no speech", or the stationary background noise contains strong impulse-type noise bursts. Because of the short 
duration of the averaging periods in known types of DTX systems such ill-conditioned speech coding parameters may 
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change the result of the averaging significantly enough that the resulting averaged CN parameters do not accurately 
characterize the background noise. This results in a mismatch either in the level or in the spectrum, or both, between 
the background noise and the comfort noise. The quality of transmission is thus impaired as the background noise 
sounds different to the user depending on whether it is received during speech (normal speech coding of speech and 

5 background noise) or during speech pauses (produced by comfort noise generation). 

[0011] In greater detail, during the DTX hangover period any frames declared by the VAD algorithm as being "no 
speech" frames are sent over the air interface, and the speech coding parameters are buffered to be able to evaluate 
the comfort noise parameters for a first SID frame. The first SID frame is transmitted immediately after the end of the 
DTX hangover period. The length of the DTX hangover period is thus determined by the length of the averaging period. 

10 Therefore, to minimize the channel activity of the system, the averaging period should be fixed at a relatively short 
length, 

[0012] Before describing the present invention, it will be instructive to review conventional circuitry and methods for 
generating comfort noise parameters on the transmit side, and for generating comfort noise on the receive side. In this 
regard reference is thus first made to Figs. 1a-1d. 

15 [0013] Referring to Fig. 1 a, short term spectral parameters 1 02 are calculated from a speech signal 1 00 in a Linear 
Predictive Coding (LPC) analysis block 101. LPC is a method well known in the prior art. For simplicity, discussed 
herein is only the case where the synthesis filter has only a short term synthesis filter, it being realized that in most 
prior art systems, such as in GSM FR, HR and EFR coders, the synthesis filter is constructed as a cascade of a short 
term synthesis filter and a long term synthesis filter. However, for the purposes of this description a discussion of the 

20 long term synthesis filter is not necessary. Furthermore, the long term synthesis filter is typically switched off during 
comfort noise generation in prior art DTX systems. 

[0014] The LPC analysis produces a set of short term spectral parameters 1 02 once for each transmission frame. 
The frame duration depends on the system. For example, in all GSM channels the frame size is set at 20 milliseconds. 
[0015] The speech signal is fed through an inverse filter 103 to produce a residual signal 104. The inverse filter is 
25 of the form: 



[0016] The filter coefficients a(i), i=1 M are produced in the LPC analysis and are updated once for each frame. 
Interpolation as is known in prior art speech coding may be applied in the inverse filter 103 to obtain a smooth change 
35 in the filter parameters between frames. The inverse filter 1 03 produces the residual 1 04 which is the optimal excitation 
signal, and which generates the exact speech signal 100 when fed through synthesis filter 1/A(z) 112 on the receive 
side (see Fig. 1b). The energy of the excitation sequence is measured and a scaling gain 106 is calculated for each 
transmission frame in excitation gain calculation block 105. 

[0017] The excitation gain 106 and short term spectral coefficients 102 are averaged over several transmission 
40 frames to obtain a characterization of the average spectral and temporal content of the background noise. The aver- 
aging is typically carried out over four frames for the GSM FR channel to eight frames, as is the case for the GSM EFR 
channel. The parameters to be averaged are buffered for the duration of the averaging period in blocks 1 07a and 1 08a 
(see Fig. 1 d). The averaging process Is carried out in blocks 107 and 1 08, and the average parameters that characterize 
the background noise are thus generated. These are the average excitation gain gmean and the average short term 
45 spectral coefficients. In modern speech codecs, there are typically 10 short term spectral coefficients (M=10) which 
are usually represented as Line Spectral Pair (LSP) coefficients f mean (i), 1=1,-., M, as in the GSM EFR DTX system. 
Although these parameters are typically quantized prior to transmission, the quantization is ignored in this description 
for simplicity, in that the exact type of quantization that is performed is irrelevant to an understanding of the operation 
of the invention as described below. 
so [0018] Referring briefly to Fig. 1d, it is shown that the averaging blocks 107 and 108 each typically include the 
respective buffers 1 07a and 1 08a, which output buffered signals 1 07b and 1 08b, respectively, to the averaging blocks. 
Greater attention will be paid to the buffers 107a and 108a below when describing the embodiments of the invention 
shown in Figs. 4 and 5. 

[0019] The computation and averaging of the comfort noise parameters Is explained in detail in GSM recommenda- 
55 tion: GSM 06.62 "Comfort noise aspects for Enhanced Full Rate (EFR) speech traffic channels". Also by example, 
discontinuous transmission is explained in GSM recommendation: GSM 06.81 "Discontinuous Transmission (DTX) for 
Enhanced Full Rate (EFR) for speech traffic channels", and voice activity detection (VAD) is explained in GSM recom- 
mendation: GSM 06.82 "Voice Activity Detection (VAD) for Enhanced Full Rate (EFR) speech channels". As such, the 



3 



EP 0 843 301 B1 

details of these various functions are not further discussed here. 

[0020] Referring to Fig. 1b, there is shown a block diagram of a conventional decoder on the receive side that is 
used to generate comfort noise in the prior art speech communication system. The decoder receives the two comfort 
noise parameters, the average excitation gain g mean and the set of average short term spectral coefficients i mean (i), 

5 i=1 ,..,M, and based on the parameters the decoder generates the comfort noise. The comfort noise generation operation 
on the receive side is similar to speech decoding, except that the parameters are used at a significantly lower rate (e. 
g„ once every 480 milliseconds, as in the GSM FR and EFR channels), and no excitation signal is received from the 
speech encoder. During speech decoding the excitation on the receive side is obtained from a codebook that contains 
a plurality of possible excitation sequences, and an index for the particular excitation vector in the codebook is trans- 

10 mitted along with the other speech coding parameters. For a detailed description of speech decoding and the use of 
codebooks reference can be had to, by example, U.S. Patent No.: 5,327,519, entitled "Pulse Pattern Excited Linear 
Prediction Voice Coder", by Jari Hagqvist, Kari Jarvinen, Kari-Pekka Estola, and Jukka Ranta, which should be read 
in conjuction with this document. 

[0021] During comfort noise generation, however, no index to the codebook is transmitted, and the excitation is 
15 obtained instead- from a random number or excitation (RE) generator 1 1 0. The RE generator 1 1 0 generates excitation 
vectors 114 having a flat spectrum. The excitation vectors 114 are then scaled by the average excitataon gain g mean 
in- scaling unit 115 so that their energy corresponds to the average gain of the excitation 104 on the transmit side. A 
resulting scaled random excitation sequence 1 1 1 is then input to the speech synthesis filter 1 1 2 to generate the comfort 
noise output signal 113. The average short term spectral coefficients f mean (i) are used in the speech synthesis filter 1 12. 
20 [0022] Fig. 1 c illustrates the spectrum associated with the signal in different parts of the prior art decoder of Fig. 1 b. 
The RE-generator 110 produces the random number excitation sequences 114 (and the scaled excitation 111) having 
a flat spectrum. This spectrum is shown by curve A. The speech synthesis filter 112 then modifies the excitation to 
produce a non-flat spectrum as shown in curve B. 

[0023] As was discussed above, a number of problems exist with respect to conventional comfort noise generation 
25 techniques. These problems include the mismatch between the random excitation and the correct or optimal excitation 
which results in the comfort noise generated at the receive side sounding different from the actual background noise 
on the transmit side. It is a goal of this invention to reduce or eliminate these problems. 

[0024]. This invention tackles the problem of generating comfort noise during discontinuous transmission so as to 
minimize a loss of signal quality due to the use of discontinuous transmission. 
30 [0025] According to the invention there are provided a method as set out in claim 1 and an apparatus as set out in 
claim 21. 

[0026] Embodiments of this invention provide comfort noise generation methods that are able to better characterize 
background noise, and that further provide an improved quality of comfort noise and an improved quality of transmission 
during discontinuous transmission. 

35 [0027] Embodiments of this invention provide a comfort noise generation technique that eliminates or minimizes the 
generation of non-representative comfort noise, and which employs a reduced averaging time. 
[0028] In accordance with a preferred embodiment of this Invention all or a predetermined number of Ill-conditioned 
speech coding parameters within an averaging period are removed, or replaced by applying a median replacement 
method, when the parameters are averaged. In this embodiment of the invention steps are executed of measuring the 

40 distances of the speech coding parameters from each other between individual frames within an averaging period, 
ordering these parameters according to the measured distances, finding the parameters which have the largest dis- 
tances to the other parameters within the averaging period, and, if the distances exceed a predetermined threshold, 
replacing these parameters with a parameter which has a smallest measured distance (i.e., a median value) to the 
other parameters within the averaging period. The median valued parameter Is considered to have a value which is 

45 the most faithful representation of the characteristics of the background noise among the parameters within the aver- 
aging period. After this procedure, the averaging of the speech coding parameters may be performed in any desired 
manner. Furthermore, the teaching of this embodiment of the invention does not change the way in which the CN 
parameters are received and used on the receive side of the DTX system. 

[0029] In addition to removing the ill-conditioned CN parameters from the averaging period, and thereby Improving 
50 the comfort noise quality, this embodiment of the invention provides other advantages. For example, in prior art DTX 
systems a longer averaging period is required to be used In order to reduce the effect of the ill-conditioned parameters 
in the averaging. The use of this invention beneficially allows the use of a shorter averaging period than in prior art 
DTX systems, since the effect of the ill-conditioned parameters on the averaging operation is reduced. Also, in the 
prior art DTX systems a longer hangover period is required due to the longer averaging period, thereby increasing the 
55 ohannel activity. The shorter averaging period made possible by this embodiment of the invention thus also enables 
the DTX hangover period to be reduced, and thereby reduces channel activity. Furthermore, in the prior art DTX sys- 
tems, due to the longer averaging period employed, a significant amount of static memory is required by the CN av- 
eraging algorithm. A further advantage of the shortened averaging period achieved by this invention is a reduction in 
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an amount of static memory required by the CN averaging algorithm. 

[0030] Exemplary embodiments of the invention are hereinafter described with reference to the accompanying draw- 
ings, in which: 

Fig. 1 a is a block diagram of conventional circuitry for generating comfort noise parameters on the transmit side. 
Fig. 1b is a block diagram of a conventional decoder on the receive side that is used to generate comfort noise. 
Fig. 1c illustrates the spectrum associated with the signal in different parts of the prior-art decoder of Fig. 1b. 
Fig. 1 d illustrates in greater detail the averaging blocks shown in Fig. 1a. 

Fig. 2a is a block diagram of circuitry for generating comfort noise parameters on the transmit side. 
'5 Fig. 2b is a block diagram of a decoder on the receive side that is used to generate comfort noise. 

Fig. 2c illustrates the spectrum associated with the decoder of Rg. 2b. 

Fig. 3a is a block diagram of a second embodiment of circuitry for generating comfort noise parameters on the 
20 transmit side. 

Fig. 3b Is a block diagram of a second embodiment of decoder on the receive side. 

Figs. 4 and 5 are each a block diagram of circuitry for evaluating comfort noise parameters on the transmit side 
25 of a DTX digital communications system in accordance with embodiments of this invention. 

Fig. 6 is a block diagram of a conventional speech encoder, Figs. 7 and 8 are timing diagrams that illustrate the 
output of the conventional speech encoder of Fig. 6, and Fig. 9 is block diagram of a conventional speech decoder, 
all of which are useful in explaining the speech decoder shown in Fig. 1 0, which illustrates a further embodiment 
30 of this invention. 

Figs. 11 a-11 g illustrate exemplary frequency responses of the RESC filter. 

Fig. 1 2 illustrates a mobile station suitable for practicing this invention, while Fig. 1 3 illustrates the mobile terminal 
35 coupled to a base station of a wireless communications system that is also suitable for practicing this invention. 

Fig. 14 is a timing diagram illustrating a normal hangover procedure, wherein N e(apsed indicates a number of elapsed 
frames since a last occurrence of updated comfort noise (CN) parameters, and wherein is equal to or 

greater than 24. 



40 



Fig. 15 is a timing diagram illustrating the handling of short speech bursts, wherein f^jgp^ is less than 24. 



[0031] A description was made previously of a conventional technique for both encoding and decoding comfort noise. 
Reference Is how made to Figs. 2a-2c for showing a first embodiment of circuitry and a method in accordance with 

45 this invention. In Figs. 2a and 2b those elements that appear also in Figs. 1a and 1b are numbered accordingly. 

[0032] It is first noted that "SID averaging period" is a GSM-related phrase, while "comfort noise averaging period" 
or "CN averaging period" is an IS-641 , Rev. A -related phrase. For the purposes of this Invention these two phrases 
may be used Interchangeably in the following description. Likewise, the phrases "SID frame" and "comfort noise pa- 
rameter message" or "CN" parameter message" may be used interchangeably. 

50 [0033] In Fig. 2a there Is shown a block diagram of apparatus for producing comfort noise parameters on the transmit 
side. The novel operations In this apparatus block diagram are separated from those known from the prior art by a 
dashed line 204. 

The residual signal 1 04 output from the inverse filter 1 03 is subjected to a further analysis (such as LPC-analysis) 
to produce another set of filter coefficients. The second anaiysis, which is referred to herein as random excitation (RE) 
55 LPC-analysis 200, is typically of a lower degree than the LPC analysis carried out in block 101. The random excitation 
spectral control (RESC) parameters, r mean (i), i=1,...,R, are obtained by averaging the spectral parameters 201 from 
the RE LPC-analysis block 200 over Several consecutive frames in averaging block 203. The RESC parameters char- 
acterize the spectrum of the excitation. 
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[0034] It should be noted that the RESC parameters are not a subset of the speech coding parameters, but are 
generated and used only during comfort noise generation. The inventors have found that first or second order LPC- 
analysis is sufficient to generate the RESC parameters (R=1 or 2). However, spectral models other than the all-pole 
model of the LPC technique may also be used. The averaging may alternatively be carried out by the RE LPC analysis 

5 block 200 by averaging the autocorrelation coefficients within the LPC parameter calculation, or by any other suitable 
averaging technique within the LPC coefficient computation. The averaging period for the RESC parameters may be 
the same as that used for the other CN parameters, but is not restricted to only the same averaging period. For example, 
it has been found that longer averaging than what is used for the conventional CN-parameters can be advantageous. 
Thus, instead of using an averaging period of seven frames, a longer averaging period may be preferred (e.g., 10-12 

10 frames). 

[0035] Prior to calculating the excitation gain, the LPC-residual 1 04 is fed through a second inverse filter H RESC (z) 
202. This filter produces a spectrally controlled residual 205 which generally has a flatter spectrum than the LPC- 
residual 104. The random excitation spectral control (RESC) inverse filter H RESC (z) may be of the form of an all-zero 
filter (but not restricted to only this form): 

15 

HnEsc(z)-l'Y,bO)z^ (2) 
*«/ 

20 

[0036] The excitation gain is calculated from the spectrally flattened residual 205. Otherwise the operations in Fig. 
2a are similar to those described above with regard to Fig. 1a. 

[0037] Referring now to Fig. 2b, there is shown a block diagram of decoder on the receive side that is used to generate 
25 comfort noise according to the present invention. In the decoder, the excitation 212 is formed by first generating the 
white noise excitation sequence 114 with the random excitation generator 110, which is then scaled by g^an in scaling 
block 11 5. 

[0038] The spectrally flat noise sequence 1 1 1 is then processed in a random excitation spectral control (RESC) filter 
211, which produces an excitation having a correct spectral content. The RE spectral control filter 211 performs the 
30 inverse operation to the RESC Inverse filter 202 employed in the encoder of Fig. 2a. Using the RESC inverse filter of 
equation (2) on the transmit side, the RE spectral control filter 211 used on the receive side is of the form 



40 

[0039] The RESC-parameters r mean (i) ( i=1 ,...,R that define the filter coefficients b(i), 1=1 R are transmitted as part 
of the CN parameters to the receive side, and are used in the RE spectral control filter 211 so that the excitation for 
the synthesis filter 112 Is suitably spectrally weighted, and is thus generally not spectrally flat. The RESC parameters 
r mean (i), M,...,R may be the same as the filter coefficients b(i), 1=1 ,..,R, or they may use some other parameter repre- 
45 sentation that enables efficient quantization for transmission, such as LSP coefficients. Figs. 1 1a-1 1 g illustrate exem- 
plary frequency responses of the RESC filter 211 . 

[0040] In review, the CN-excitation generator 21 0 generates a spectrally flat random excitation in the RE generator 
1 1 0. The spectrally flat excitation is then suitably scaled by the average gain scaler 1 15. To produce the correct spectrum 
for the comfort noise, and to avoid a mismatch between the spectrum of the comfort noise and that of the background 
50 noise, the random excitation is fed through the RE spectral control filter 211 . The spectrally controlled excitation 212 
is then used In the speech synthesis filter 112 to produce comfort noise that has an improved match to the spectrum 
of the actual background noise that is present at the transmit side. 

[0041] The RESC parameters are not a subset of the speech coding parameters that are used during speech signal 
processing, but are instead calculated only during the comfort noise calculation. The RESC parameters are computed 
55 and transmitted only for the purpose of generating improved excitation for comfort noise during speech pauses. The 
RESC inverse filter 202 In the encoder and the RESC filter 21 1 in the decoder are used only for the purpose of controlling 
the spectrum of the random excitation. 

[0042] Fig. 2c illustrates the spectrum of certain signals within the decoder of Fig. 2b during the generation of comfort 
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noise. The RE generator 110 produces the random number sequences having the flat spectrum shown in curve A. 
This spectrum is identical to that shown in curve A of Fig. 1c. Signals 114 and 111 both have this flat spectrum, it being 
noted that the gain scaling that occurs in block 1 1 5 does not affect the shape of the spectrum. The white noise sequence 
111 is then fed through RE spectrum control filter 211 to produce the excitation 212 to the LPC synthesis filter. The 

5 improved excitation sequence 212 generally has a non-flat spectrum (curve C), and the effect of this non-flat spectrum 
is observed in the spectrum of the output signal 113 of the synthesis filter 112 (curve D). The excitation sequence 212 
may be lowpass or highpass type, or may exhibit a more sophisticated frequency content (depending on the degree 
of the RESC filter). The spectrum control is determined by the RESC parameters, which are computed on the transmit 
side and transmitted as part of comfort noise to the receive side, as was described above. 

10 [0043] Contrasting Fig. 3a to Fig. 2a, it can be observed that the calculation of the excitation gain is carried out from 
the LPC residual 1 04, and not from the residual from the RESC inverse filter 202. The RESC inverse filter 202 is thus 
not required in Fig. 3a, and can be eliminated. The decoder on the receive side for use with the encoder of Fig. 3a is 
shown in Fig. 3b. When compared to Fig. 2b, it can be noted that the scaling (block 115) of the excitation is moved to 
the output of the RE spectrum control filter 211. Otherwise the operation of the encoder and decoder of Figs. 3a and 

is 3b is similar to that shown in Figs. 2a and 2b. 

[0044] Referring now to Fig. 4, there is shown a block diagram of circuitry for evaluating comfort noise parameters 
on the TX side according to an embodiment of this invention. This embodiment addresses the above-mentioned prob- 
lems that arise when there exists a single frame or a small number of frames within an averaging period for which 
some or all of the speech coding parameters give a poor characterization of the typical background noise. The oper- 

20 ations according to this embodiment of the invention are separated from those known from the prior art by the dashed 
lines 300 and 310. According to this embodiment of the invention, the speech coding parameters which are buffered 
in block 107a and 108a are subjected to a thresholded median replacement process before they are applied to aver- 
aging blocks 107 and 10B for computing the average excitation gain g mean and the average short term spectral coef- 
ficients f mean (i). In this process, the parameters within the averaging period which have non-typical values of the back- 

25 ground noise are replaced, if specific conditions are met, by the parameter values which are considered as typical of 
the actual background noise, i.e., the median values. 

[0045] First, the operations indicated by the block 300 that are performed on the scalar valued excitation gain pa- 
rameters g prior to averaging in block 1 07 are discussed. The set of excitation gain values 1 07b buffered in block 1 07a 
over the averaging period are forwarded to block 301 , in which they are ordered according to their values. Each of the 

30 excitation gain values has its own index within the set. The ordered set of gain parameters 302 is forwarded to a median 
replacement block 303, in which those L excitation gain values differing the most from the median value, while the 
difference exceeds the predetermined threshold value, are replaced by the median value of the parameter set. The 
differences between each Individual parameter value and the median value are computed in block 304, and the indices 
of the excitation gain values for which the absolute value of this computed difference exceeds a threshold are com- 

35 municated as signal 305 to the median replacement block 303. 

[0046] The length N of the averaging period Is preferably an odd number. In this case, the median of the ordered set 
is its ((N+1)/2)th element. The variable L, which determines the number of replaced parameters, may assume a value 
between 0 and N-1 . L may also be a predetermined value (i.e., a constant). 

[0047] If there exist individual excitation gain values such that the difference between the excitation gain value and 
40 the median value exceeds the predetermined threshold, the selector 307 Is switched to the position in which excitation 
gain values 309 for the averaging block 107 are obtained from the median replacement block 303 as signal 308. 
However, if for each of the excitation gain values the difference between the gain value and the median value does 
not exceed the predetermined threshold, the selector 307 is switched such that the parameters 309 input to the aver- 
aging block 107 are obtained directly from the buffer block 107a. 
45 [0048] The switching state of selector 307 is controlled by the threshold block 304 with signal 306. 

[0049] Next, the operations of block 310 are discussed with regard to the LSP coefficients f(k), k=1 ,...,M, prior to 
averaging in block 1 08. The set of LSP coefficients 1 08b buffered in block 1 08a over the averaging period are forwarded 
to block 311. The spectral distance of the LSP coefficients f f (k) of the ith frame in the averaging period, to the LSP 
coefficients fj(k) of the jth frame in the averaging period, is approximated according to the following equation: 

50 

*R, = f,(f.(l<)-f ,<*)?. (4) 

55 

where M is the degree of the LPC model, and fj(k) is the kth LSP parameter of the ith frame in the averaging period. 
[0050] To find the spectral distance AS, of the LSP coefficients f { (k) of frame i to the LSP coefficients of all the other 
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frames j=1 ,...,N, i*j, within the averaging period of length N, the sum of the spectral distances ARy is calculated as 
follows: 



s N 

A5/= £ ARy. (5) 

10 for all i=1 N(ARy=0 (i.e., the distance of a parameter from itself is zero). The operations expressed in equations (4) 

and (5) are carried out in block 311 . 

[0051] The spectral distance can be approximated using a number of other representations of the LPC filter, for 
example, see A.H. Gray, Jr. and J.D. Market, "Distance measures for speech processing," IEEE Transactions on Acous- 
tics, Speech, and Signal Processing, Vol. 24, pp. 380-391 , 1976. Also Immittance Spectral Pairs (ISP) can be utilized 
*5 similarly as line spectral pairs, for example see Y. Bistritz and S. Peller, "Immittance spectral pairs (ISP) for speech 
encoding," in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Minneap- 
olis, Minnesota, Vol. 2, pp. 9-12, 27-30 April 1993. 

[0052] After the spectral distances AS, have been found in block 31 1 for each of the LS P vectors fj within the averaging 
period, these distances 312 are forwarded to block 313. In the ordering block 313, the spectral distances are ordered 
20 according to their values. Each of the spectral distance values is related by an index to one LSP vector within the 
averaging period. The vector fj with the smallest distance ASj within the averaging period M , 2,...,N is considered as 
the median vector f^ of the averaging period. Its distance is denoted as AS^. 

[0053] The set of LSP coefficient vectors f, within the averaging period are ordered in block 313 according to the 
ordering found for the spectral distances. This ordered set of LSP vectors 314 obtained from block 313 is forwarded 
25 to the median replacement block 315. In block 315, P(Q£/*£/v"-1 ) LSP vectors fj are replaced by the median f med . The 
indices of these P vectors are determined by comparing AS, for 1=1 ,2,...,N with the median AS,^ in block 31 6. Hence 
the indices of f ( for which AS| - AS med is greater than a threshold are communicated by signal 31 7 to the median 
replacement block 315. 

[0054] If the difference AS ( - AS,^ Is greater than a threshold for some i=1 ,2,...,N, the selector 319 is switched into 
30 such a position that the averaging block 108 receives the parameters 321 from the median replacement block 315 as 
signal 320. However, if ASj - AS^ Is smaller than a threshold for all i=1 ,2,...,N, the selector 319 is switched to the 
position in which the input signal 321 to the averaging block 1 08 is obtained directly from the buffer block 1 08(a) through 
signal 108(b). 

[0055] The selector 31 9 is controlled by the threshold block 31 6 with signal 31 8. 

35 [0056] Fig. 5 shows another embodiment of the invention. In this embodiment the operations according to this in- 
vention are distinguished from those known from the prior art by the dashed line 400. While in the embodiment shown 
in Fig. 4 and described above the median operations are performed independently for the excitation gain values g and 
the LSP vectors f h in the embodiment of Fig. 5 these two parameter sets are handled together as follows. 
[0057] If it is determined that the parameters In an individual frame are to be replaced by the median values, then 

40 both the excitation gain value g and the LSP vectors f, of that frame are replaced by the respective parameters of the 
frame containing the median parameters. 

[0058] In order to find the ordering of the frames for median replacement, the equation (4) of the approximated 
distance ARy between the parameters of the ith frame and the jth frame of the averaging period is revised to take Into 
account both the excitation gain value g and the LSP vector f ( as follows: 

45 

*T, = ' S t(f,<k)-f i <k)f+w(g,-g i j'. (6) 

50 

where M Is the degree of the LPC model, f f (k) is the kth LSP parameter of the ith frame of the averaging period, and 
g, is the excitation gain parameter of the Ith frame. 

[0059] To find the distance AS ( of the parameters of frame i, for all i=1 ,...,N, to the parameters of all the other frames 
j=1 ,...,N, i*j within the averaging period of length N, equation (5) is applied after computing ATy. Distance ATy is then 
55 used instead of distance ARy in equation (5). The procedures expressed by equations (5) and (6) are carried out in 
block 401 . The weighting factor w is chosen to obtain a subjectively preferred compromise between performing the 
median replacement according to the excitation gain values or according to the spectral distances. The subjectively 
preferred compromise is found by carrying out tests with typical users. 
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[0060] After the distances AS, have been found in block 401 for each of the frames within the averaging period, these 
distances 402 are forwarded to ordering block 403. In the ordering block 403 the distances are ordered according to 
their values. Each of the distances is related by an index to one frame within the averaging period. The frame with the 
smallest distance AS, within the averaging period i=1 ,2,...,N is considered as the median frame of the averaging period, 

5 with parameters g med and f mod . Its distance is denoted as AS med . 

[0061] The excitation gain values to be ordered in block 403 are forwarded to the block by signal 1 07b from buffer 
107a, and the LSP coefficients are forwarded to the block by signal 1 08b from buffer 108a. As was stated above, the 
set of parameters within the averaging period are ordered in block 403 according to the ordering found for their spectral 
distances AS;. The ordered set of parameters obtained from block 403 is forwarded as signals 404 and in 405 to the 

10 median replacement block 406. In block 406, parameters g t and f, of L{0<L < N • 1 ) frames are replaced by the param- 
eters g med and f med of the median frame. The indices of these L vectors are determined by comparing ASj for i=1 ,2,.. 
N with the median AS med in block 407, and communicated to the median replacement block 406 as signal 408. If the 
difference AS r AS med is greater than a threshold in block 407, the parameters g f and fj are replaced by g med and f med 
in median replacement block 406. The value of L may be bounded by p re-determined minimum and maximum values. 

15 [0062] If the difference AS r AS mod is greater than a threshold for some i=1 ,2,...,N, the selector 41 0 is switched such 
that the averaging block 1 08 receives the parameters 321 from the median replacement block 406 as signal 41 1 , and 
the averaging block 107 receives the parameters 309 from the median replacement block 406 as signal 412. However, 
if AS r AS med is smaller than a threshold for all M ,2, .. . ,N, the selector 41 0 is switched to such that the input signal 321 
to the averaging block 108 is obtained directly from the buffer block 108a through signal 108b, and the input signal 

20 309 to the averaging block 1 07 is obtained directly from the buffer block 1 07a through signal 1 07b. The selector 41 0 
is controlled by the threshold block 407 with signal 409. 

[0063] In addition to subtracting the median distance from an individual distance (i.e., by computing AS r AS med ) ) the 
differences between each individual distance and the median distance can be computed in blocks 316 and 407 by, for 
example, dividing an individual distance by the median distance (i.e., by computing ASj/AS med ). This may be a preferred 
25 method in most cases, since it finds a relative, or normalized, deviation of an individual distance from the median 
distance, independent of the absolute values of the distances ASj and AS med . 

[0064] Before now describing a further embodiment of this invention reference is made to Fig. 6, which is a simplified 
block diagram of the transmit (TX) side speech encoder DTX system. The incoming signal 601 from an analog-to- 
digital converter 600 is processed frame by frame in the speech encoder 602. As before, the length of the frame is 
30 typically 20 msec. The sampling frequency of the speech signal 601 is generally 8 kHz. The speech encoder 602 
encodes the input speech frame by frame into a set of parameters 603 which are sent to the radio subsystem 611 of 
the digital mobile radio unit for transmitting to the receive (RX) side. 

[0065] The operation of the DTX mechanism is indirectly controlled by a voice activity detection (VAD) performed on 
the TX side. The basic function of the VAD 604 Is to distinguish between noise with speech present and noise without 
35 speech present. The VAD 604 operates continuously to evaluate whether the input signal contains speech or does not 
contain speech. The operation of the VAD 604 is based on the speech encoder 602 and its Internal variables 605. The 
output of the VAD 604 is a binary VAD flag 606 which is equal to one when speech is present, and which is equal to 
zero when speech is not present. The VAD 604 operates on a frame by frame basis, as is specified in, by example, 
GSM 06.82. 

40 [0066] The speech encoder DTX handler 61 2 continuously passes traffic frames, individually marked by a binary SP 
flag 607, to the radio subsystem 611. The SP flag 607 indicates to the radio subsystem 611 whether a traffic frame 
passed by the DTX handler 612 Is a speech frame (SP flag = "1") or a so-called Silence Descriptor (SID) frame (or 
Comfort Noise Parameter message) SP flag = "O a ). The radio subsystem 611 controls the scheduling of the frames 
for transmission on the air interface, based on the state of the SP flag 607. 

45 [0067] A fundamental problem associated with the foregoing use of DTX is that the background acoustic noise, which 
is transmitted together with the speech, may disappear when the transmission over the air interface is terminated, 
resulting in discontinuities of the background noise on the RX side. Since the DTX switching can occur rapidly, ft has 
been found that this effect can be objectionable to the listener. This is particularly true in environments with a high 
background noise level, such as a vehicle. At worst, this effect may result in the speech becoming unintelligible. 

50 [0068] A presently preferred solution to this problem is to generate, on the RX side, synthetic noise (i.e., comfort 
noise) similar to the TX side background noise when the transmission is terminated. As was described above, the 
required parameters for comfort noise generation are evaluated In the speech encoder on the TX side (block 608 In 
Fig. 6) and are transmitted to the RX side in SID frames before the radio transmission is switched off, and at a repetitive 
low rate thereafter. This allows the comfort noise generated during speech inactivity on the RX side to adapt to the 

55 changes of the background noise on the TX side. 

[0069] It has been found that comfort noise of good subjective quality can be generated on the RX side if the comfort 
noise parameters evaluated on the TX side appropriately represent the level and the spectral envelope of the acoustic 
background noise. These characteristics of background noise often vary slightly with time, and therefore in order to 
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obtain a good representation, the parameters of the speech encoder describing the level and the spectral envelope of 
the background noise need to be averaged over a few speech frames. In the DTX systems of the GSM full rate and 
enhanced full rate speech coders (see GSM 06.31 and GSM 06.81), the length of the SID averaging period is four 
speech frames and eight speech frames, of 20 milliseconds duration, respectively. 

5 [0070] In order to evaluate and transmit the first SID frame containing comfort noise parameters to the RX side at 
the end of a speech burst, before the transmission is switched off, the above-mentioned hangover period is introduced. 
The hangover period is a period during which speech inactivity has been detected by the VAD 604 (i.e., VAD flag 606 
= "O"), but the transmission of speech frames has not yet been switched off (i.e., SP flag 607 = "1"). Reference in this 
regard may also be had to Fig. 7. During the hangover period, since the VAD 604 has detected speech inactivity, it is 

io guaranteed that the speech frames contain only noise (and not speech), and thus these hangover frames can be used 
for the averaging of speech encoder parameters to evaluate the comfort noise parameters. 

[0071] The length of the hangover period is determined by the length of the SID averaging period, i.e., the length of 
the hangover period must be long enough to complete the averaging of the parameters before the resulting comfort 
noise parameters are to be transmitted in a SID frame. In the DTX system of the GSM full rate speech coder, the length 

15 of the hangover period equals four frames (the length of the SID averaging period), since the comfort noise evaluation 
technique uses only parameters from the previous frames to make an updated SID frame available. In the DTX system 
of the GSM enhanced full rate speech coder, the length of the hangover period equals seven frames (the length of the 
SID averaging period minus one), since the parameters of the eighth frame of the SID averaging period can be obtained 
from the speech encoder while processing the first SID frame. Fig. 7 illustrates the concepts of the hangover period 

20 and the SID averaging periods in the DTX system of the GSM enhanced full rate speech coder. 

[0072] At the end of the hangover period the first SID frame is transmitted, and the comfort noise evaluation algorithm 
continues evaluating the characteristics of the background noise and passes the updated SID frames to the radio 
subsystem 611 frame by frame, as long as the VAD 604 continues to detect speech inactivity. The TX DTX handler 
612 informs the comfort noise evaluation algorithm 608 of the completion of a SID averaging period using a flag 609. 

25 The flag 609 is normally reset to n 0 u . and is raised to a "1 " whenever an updated SID frame is to be passed to the radio 
subsystem 611. When the flag 609 is raised, the comfort noise evaluation algorithm 608 performs the averaging of 
parameters to make an updated SID frame available for the radio subsystem 611 . The updated SID frames are sent 
to the radio subsystem 61 1 , as well as written to a SID memory block 61 0, which stores the most recent SID frame for 
later use. 

30 [0073] If, at the end of the speech burst, less than 24 frames have elapsed since the last SID frame was computed 
and passed to the radio subsystem, then the last SID frame is repeatedly fetched from the SID memory 61 0 and passed 
to the radio subsystem 611 . This occurs until a new updated SID frame is available, i.e., this process continues until 
the SID averaging period is again completed. This technique reduces the transmission activity in cases when short 
background noise spikes are interpreted as speech, since there Is no need to insert the hangover period at the end of 

35 the speech burst to be able to compute a new SID frame. 

[0074] Fig. 8 shows as an example the longest possible speech burst without hangover. The binary flag 613 is used 
for signalling the SID memory 610 when to store the new, updated SID frame in the SID memory 610, and when to 
send the most recent updated SID frame from the SID memory 61 0 to the radio subsystem 611 . The SID memory 61 0 
determines whether to store or send the SID frame during each frame when the SP flag 607 is a "O u . 

40 [0075] The binary flag 614 is also needed, in the DTX system of the GSM enhanced full rate speech coder, to inform 
the noise evaluation algorithm about the end of the hangover period. The flag 61 4 is normally reset to "0", and is raised 
to a T for the duration of one frame when the first SID frame after a speech burst is to be sent, if preceded by the 
hangover period. 

[0076] Fig. 9 is a block diagram of the speech decoder of the receive (RX) side of the DTX system. The Incoming 
45 set of speech coder parameters 701 from the radio subsystem 700 of the digital mobile radio unit is processed frame 
by frame in the speech decoder 702 to synthesize a speech signal 703 which is provided to a digital-to-analog converter 
704. The digital-to-analog converter 704 generates an audio signal for the listening user. 

[0077] The RX DTX system receives from the radio subsystem the binary SP flag 705, which mirrors the operation 
of the SP flag of the TX side, I.e., the SP flag = "1 " when a speech frame is received, and SP flag = "0" when either a 

50 SID frame is received, or the transmission is terminated. The binary flag 706, also received from the radio subsystem 
700, informs the comfort noise generation algorithm 707 of the existence of a new received SID frame, i.e, the flag Is 
normally reset to "O", and is raised to a T whenever the SP flag 705 is "0" and a new SID frame is received. 
[0078] When the SP flag 705 = "0", i.e. , the discontinuous transmission is active, the comfort noise generation block 
707 of the speech decoder 702 generates comfort noise based on the representation of the characteristics of the 

55 background noise on the TX side, as received in the SID frames. Updated SID frames are received at a repetitive low 
rate during discontinuous transmission, and the decoded comfort noise parameters are interpolated between the update 
SID frames to provide smooth transitions in the characteristics of the comfort noise. 

[0079] In the DTX system of the GSM full rate speech encoder, whenever a new, updated SID frame is to be computed 
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and sent to the radio subsystem 611 (Fig. 6), the parameters describing the characteristics (the level and the spectrum) 
of the background noise are averaged over the SID averaging period and scalarty quantized, using the same quantizing 
schemes as used for quantizing in the normal speech encoding mode. Likewise, when a SID frame arrives in the GSM 
full rate speech decoder 702, the silence descriptor parameters are decoded using the same dequantization schemes 
as used in the normal speech decoding mode (e.g., see GSM 06.12). 

[0080] In the DTX system of the GSM enhanced full rate speech encoder, the parameters describing the spectrum 
of the background noise (the LSP parameters) are averaged over the SID averaging period when a new SID frame is 
to be computed, and vector quantized using predictive quantization tables which are also used for quantization of these 
parameters in the normal speech encoding mode. In the decoder 702 these spectral parameters are dequantized using 
the same predictive dequantization tables as used in the normal speech decoding mode. The parameters describing 
the level of the background noise (the fixed codebook gain) are averaged over the SID averaging period when a new 
SID frame is to be computed, and quantized using the scalar predictive quantization table which is also used for quan- 
tization of these parameters in the normal speech encoding mode. In the decoder, these gain parameters are dequan- 
tized using the same predictive dequantization table as used in ordinary speech decoding mode (see GSM 06.62). 
[0081] However, the adaptivity of the predictive quantizers makes it difficult to employ this type of a quantization 
scheme for quantizing comfort noise parameters to be sent in SID frames. Since the transmission is terminated during 
speech inactivity, there is no way to maintain the predictors in the quantizer and the dequantizer of the encoder and 
decoder, respectively, synchronized on a frame-by-frame basis. However, the predictor values for the quantizers can 
be evaluated locally in the encoder and decoder in the same way as follows. The quantized LSP and fixed codebook 
gain parameters of the seven most recent speech frames are stored locally both in the encoder 602 and decoder 702. 
When the hangover period at the end of a speech burst has ended, these stored parameters are averaged. The obtained 
averaged parameters, which are the reference LSP parameter vector f"* and the reference fixed codebook gain g c ref , 
then have the same values both in the encoder 602 and in the decoder 702 since, due to quantization, the same 
quantized LSP and fixed codebook gain values are available in the both during the normal speech encoding mode 
(assuming an error free transmission). The averaged values of the reference LSP parameter vector pe f and the refer- 
ence fixed codebook gain g^ are then frozen until the next time the hangover period occurs after a speech burst, and 
used instead of the normal predictors in the quantization algorithms for quantization of the comfort noise parameters. 
[0082] Referring once more to Fig. 9, a RX DTX handler 708 receives the SP flag 705 as input, and outputs the 
binary flag 709, which is. normally reset to "0", and which is set to T for the duration of one frame when the hangover 
period has occurred after a speech burst. The flag 709 is required in the DTX system of the GSM enhanced full rate 
speech decoder 702 to inform the comfort noise generation algorithm 707 when to perform averaging to update the 
reference LSP parameter vector P 8 * and the reference fixed codebook gain g^ (see GSM 06.62). A method for de- 
termining the value of flag 709 is described in an earlier filed Finnish patent application FI953252, and in corresponding 
U.S. Patent Application S.N. 08/672,932, filed June 28, 1 996, and in PCT application "PCT/F1 96/00369", which should 
be read in conjunction with this document. 

[0083] In summary, in many modern speech coders the speech coding parameters are quantized using predictive 
methods. This implies that in the quantizer, an attempt is made to predict the value to be quantized as closely as 
possible, in these types of predictive quantizers, the difference or the quotient between the actual parameter value 
and the predicted parameter value Is typically quantized and sent to the receive side. On the receive side, the corre- 
sponding dequantizer has a similar predictor as the quantizer. As such; the parameter value quantized on the TX side 
can be reproduced by adding or multiplying the received difference or quotient value, respectively, with the predicted 
value. 

[0084] In such predictive quantizers, the predictor is typically made adaptive so that the result of the quantization is 
used to update the predictor after each quantization. The predictors of the quantizer and the dequantizer are both 
updated using the reproduced, quantized parameter value, in order to keep the predictors synchronized. 
[0085] The adaptivity of the predictive quantizers makes it difficult to employ the type of quantization scheme for 
quantizing comfort noise parameters that are sent in SID frames. Since the transmission is terminated during speech 
inactivity, there is no way to keep the predictors in the quantizer and the dequantizer of the encoder 602 and decoder 
702 synchronized on a frame-by-frame basis. 

[0086] It would, however, be desirable to be able to employ the same quantizing tables, for quantization of comfort 
noise parameters, as are used by the predictive quantizers in the ordinary speech encoding mode. This would require 
the prediction to be performed in a non-adaptive fashion during the discontinuous transmission. The predictors should 
have values as close to the average parameter values of the present background noise as possible, in order for the 
quantizers to be able to encode the fluctuations in the parameter values due to changes in the characteristics of the 
background noise. The same predicted values should, preferably, be available in the quantizer and in the dequantizer. 
[0087] As was indicated previously, one technique to obtain good predicted values for quantizing the comfort noise 
to be sent in SID frames is to store the quantized parameter values in the normal speech encoding mode during the 
hangover period, and to compute an average of the stored, quantized parameter values at the end of the hangover 
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period. The averaged predictor values are then frozen until the next hangover period occurs. However, a problem with 
this method is that the speech decoder 702, in those DTX techniques that are similar to that of GSM, does not know 
when a hangover period exists at the end of a speech burst. 

[0086] An aspect of this invention is thus to provide a technique to inform the speech decoder 702 of the existence 
of a hangover period at the end of a speech burst. This is accomplished, preferably, by sending the hangover period 
information as side information in the SID frame (or comfort noise parameter message) from the speech encoder 602 
to the speech decoder 702. 

[0089] To illustrate the method according to this aspect of the invention, reference is. made to Fig. 1 0. In Fig. 1 0 the 
binary flag 709 is no longer generated by the RX DTX handler, but instead is transmitted from the encoder 602 and is 
received from the transmission channel in the first SID frame. The RX DTX handler block 708 is thus no longer required 
for the purposes of dequantization using the predictive methods described in this invention, since the flag 709 is not 
required to be generated locally at the decoder 702. In accordance with this aspect of the invention, the flag 709 is 
raised to a "1" in the first SID frame, if the first SID frame is preceded by a hangover period. If the first SID frame is 
not preceded by a hangover period, the flag 709 in the first SID frame is reset to n 0'\ In the second and further SID 
frames of the comfort noise insertion period, the flag 709 is always reset to "0". 

[0090] An advantage of this aspect of the invention is that there is no need for the speech decoder DTX handler 708 
to determine locally the existence of the hangover period at the end of the speech burst. This eliminates a portion of 
the computational load from the speech decoder 702, and reduces the number of program instructions used by the 
RX DTX handler 708. 

[0091] A further advantage, related to providing the decoder 702 the information concerning the existence of the 
hangover period, is that it now becomes possible to re-initialize the pseudonoise excitation generators synchronously 
at the encoder 602 and the decoder 702 each time a hangover period ends. 

[0092] Another advantage related to providing the decoder 702 the information concerning the existence of the hang- 
over period is that the interpolation of the received comfort noise parameters can be performed in different ways, 
depending on whether or not the hangover period is present at the end of a speech burst, in order to reduce the 
perceived step-like changes in the level or spectrum of comfort noise when short speech bursts occur. 
[0093] Before further describing the operation of this invention in detail, reference is made to Figs. 12 and 13 for 
illustrating a wireless user terminal or mobile station 1 0, such as but not limited to a cellular radiotelephone or a personal 
communicator, that Is suitable for practicing this invention. The mobile station 1 0 includes an antenna 1 2 for transmitting 
signals to and for receiving signals from a base site or base station 30. The base station 30 is a part of a cellular network 
that may include a Base Station/Mobile Switching Center/lnterworking function (BMI) 32 that includes a mobile switching 
center (MSC) 34. The MSC 34 provides a connection to landline trunks when the mobile station 1 0 is involved in a call 
In the context of this disclosure the mobile station 1 0 may be referred to as the transmission side and the base station 
as the receive side. The base station 30 is assumed to include suitable receivers and speech decoders for receiving 
and processing encoded speech parameters and also DTX comfort noise parameters, as described below. 
[0094] The mobile station includes a modulator (MOD) 14A, a transmitter 14, a receiver 16, a demodulator (DEMOD) 
16A, and a controller 18 that provides signals to and receives signals from the transmitter 1 4 and receiver 16, respec- 
tively. These signals include signalling information in accordance with the air interface standard of the applicable cellular 
system, and also user speech and/or user generated data. The air interface standard is assumed for this invention to 
include a physical and logical frame structure, although the teaching of this invention Is not intended to be limited to 
any specific structure, or for use only with an IS-1 36 or similar compatible mobile station, or for use only In TDMA type 
systems. The air interface standard is also assumed to support a DTX mode of operation. 

[0095] It is understood that the controller 18 also includes the circuitry required for implementing the audio and logic 
functions of the mobile station. By example, the controller 18 may be comprised of a digital signal processor device, 
a microprocessor device, and various analog to digital converters, digital to analog converters, and other support cir- 
cuits. The control and signal processing functions of the mobile station are allocated between these devices according 
to their respective capabilities. The controller 18 is assumed for the purposes of this disclosure to include the necessary 
speech coder and other functions for implementing the improved comfort noise generation and DTX methods and 
apparatus of this invention. These functions can be implemented wholly in software, wholly in hardware, or in a mixture 
of hardware and software. 

[0096] A user interface includes a conventional earphone or speaker 1 7, a speech transducer such as a conventional 
microphone 19 in combination with an A/D converter and a speech encoder, a display 20, and a user input device, 
typically a keypad 22, all of which are coupled to the controller 18. The keypad 22 includes the conventional numeric 
(0-9) and related keys (#,*) 22a, and other keys 22b used for operating the mobile station 10. These other keys 22b 
may include, by example, a SEND key, various menu scrolling and soft keys, and a PWR key. The mobile station 10 
also includes a battery 26 for powering the various circuits that are required to operate the mobile station. 
[0097] The mobile station 10 also includes various memories, shown collectively as the memory 24, wherein are 
stored a plurality of constants and variables that are used by the controller 1 8 during the operation of the mobile station. 
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For example, the memory 24 stores the values of various cellular system parameters and the number assignment 
module (NAM). An operating program for controlling the operation of controller 18 is also stored in the memory 24 
(typically in a ROM device). The memory 24 may also store data, including user messages, that is received from the 
BMI 32 prior to the display of the messages to the user. The memory 24 also includes routines for implementing the 

5 methods described below with regard to the transmission of comfort noise parameters during DTX operation. 

[0098] It should be understood that the mobile station 1 0 can be a vehicle mounted or a handheld device. It should 
further be appreciated that the mobile station 1 0 can be capable of operating with one or more air interface standards, 
modulation types, and access types. By example, the mobile station may be capable of operating with any of a number 
of other standards besides IS-1 36, such as GSM. ft should thus be clear that the teaching of this invention is not to be 

w construed to be limited to any one particular type of mobile station or air interface standard. 

[0099] Although the invention is described next specifically in the context of an IS-1 36 embodiment, ft is again noted 
that the teaching of this invention is not limited to only this one air interface standard. 

[0100] With regard to DTX on a digital traffic channel (IS-136.1 , Rev. A, Section 2.3.11 .2), when in the DTX-High 
state the transmitter 14 radiates at a power level indicated by the most recent power-controlling order (Initial Traffic 

15 Channel Designation message, Digital Traffic Channel (DTC) Designation message, Handoff message, Dedicated 
DTC Handoff message, or Physical Layer Control message) received by the mobile station 10. 
[0101] In the DTX-Low state, the transmitter 1 4 remains off. The CDVCC is not sent except for the transmission of 
Fast Associated Control Channel (FACCH) messages. All Slow Associated Control Channel (SACCH) messages to 
be transmitted by the mobile station 10, while in the DTX-Low state, are sent as a FACCH message, after which the 

20 transmitter 14 returns again to the off state unless Discontinuous Transmission (DTX) has been otherwise inhibited. 
[0102] When the mobile station 1 0 desires to switch from the DTX-High state to the DTX-Low state, it may complete 
all in-progress SACCH messages in the DTX-High state, or terminate SACCH message transmission and resend the 
interrupted SACCH messages, in their entirety, as FACCH messages in the DTX-Low state. 
[01 03] When a mobile station switches from the DTX High state to the DTX Low state, it must pass through a transition 

25 state in which the transmitted power is at the DTX High level until all pending FACCH messages have been entirely 
transmitted. 

[0104] In the preferred embodiment of this invention the mobile station 10 remains in the transition state until a 
Comfort Noise Block (comprised of six DTX hangover slots, and the related Comfort Noise Parameter message) have 
been entirely transmitted. The Comfort Noise Block is sent without interruption. If some other FACCH message slots 
30 coincide with the sending of the Comfort Noise Block, the mobile station 1 0 delays the transmission of either the FACCH 
message or the Comfort Noise Block so as to transmit one before the other, but in any .case the FACCH messages 
are effectively grouped or segregated such that they do not interrupt or steal the slots used for the transmission of the 
Comfort Noise Block. This insures the best available quality of comfort noise that is generated at a base station voice/ 
comfort noise decoder. 

35 [0105] Reference in this regard is made to commonly assigned and copending U.S. patent application S.N. 
08/936,755, filed 9/25/97, entitled "Transmission of Comfort Noise Parameters During Discontinuous Transmission", 
by Seppo Alanara and Pekka Kapanen. 

[0106] In accordance with a specific embodiment, the Comfort Noise (CN) Parameter Message, shown below in 
Table 1 , is transmitted on the reverse digital traffic channel (RDTC), specifically the FACCH logical channel, and con- 

40 tains 38 bits, of which 26 bits contain a LSF residual vector which is quantized using the same split vector quantization 
(SVQ) codebook as used in the IS -641 speech codec. The quantizatlon/dequantization algorithms of the speech codec 
are modified to make it possible to use this codebook. The LSF parameters give an estimate of the spectral envelope 
of the background noise at the transmit side using, preferably, a 1 0th order LPC model of the spectrum. 
[0107] The next 8 bits contain a comfort noise energy quantization index, which describes the energy of the back- 

45 ground noise at the transmit side. The remaining 4 bits in the message are used for transmitting a Random Excitation 
Spectral Control (RESC) information element. 



Table 1 



Message Format 


Information Element 


Type 


Length (bits) 


Protocol Discriminator 


M 


2 


Message Type 


M 


8 


LSF residual vector 


M 


26 


CN energy quantization index 


M 


8 
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Table 1 (continued) 



Message Format 


Information Element 


Type 


Length (bits) 


RESC parameters 


M 


4 



To summarize, the problems discussed in the Background section of this patent application are addressed by gener- 
ating, on the receive side, a synthetic noise similar to the transmit side background noise. The comfort noise (CN) 

10 parameters are estimated on the transmit side and transmitted to the receive side before the radio transmission is 
switched off, and at a regular low rate afterwards. This allows the comfort noise to adapt to the changes of the noise 
on the transmit side. The DTX mechanism in accordance with this invention employs: a Voice Activity Detector (VAD) 
function 21 (Fig. 12) on the transmit side; an evaluation in the controller 1B of the background acoustic noise on the 
transmit side, in order to transmit characteristic parameters to the receive side; and a generation on the receive side 

15 of a similar noise, referred to as comfort noise, during periods where the radio transmission is switched off. 

[01 08] In addition to these functions, if the parameters arriving at the receive side are found to be seriously corrupted 
by errors, the speech or comfort noise is instead generated from substituted data In order to avoid generating annoying 
audio effects for the listener. 

[0109] The transmit side DTX function continuously passes traffic frames, each marked by a flag SP, to the radio 
20 transmitter 14, where the SP flag = "1 " indicates a speech frame, and where the SP flag = "0" indicates an encoded 
set of Comfort Noise parameters. The scheduling of the frames for transmission on the air interface is controlled by 
the radio transmitter 14, on the basis of the SP flag. 

[0110] in a preferred embodiment of this invention, and to allow an exact verification of the transmit side DTX func- 
tions, all frames before the reset of the mobile station 10 are treated as if they were speech frames for an infinitely 
25 long time. Therefore, the first 6 frames after the reset are always marked with SP flag ~ "1", even if VAD flag = "0" 
(hangover period, see Fig. 14). 

[0111] The Voice Activity Detector (VAD) 21 operates continuously in order to determine whether the input signal 
from the microphone 1 9 contains speech. The output is a binary flag (VAD flag = "1" or VAD flag = "0 M , respectively) 
on a frame by frame basis. 

so [0112] The VAD flag controls indirectly, via the transmit side DTX handler operations described below, the overall 
DTX operation on the transmit side. 

[0113] Whenever the VAD flag = "1 the speech encoded output frame is passed directly to the radio transmitter 1 4, 
marked with the SP flag = "1 ". 

[01 1 4] At the end of a speech burst (transition VAD flag = "1 " to VAD flag = "0°), it requires seven consecutive frames 
35 to make a new updated set of CN parameters available. Normally, the first six speech encoder output frames after the 
end of the speech burst are passed directly to the radio transmitter 14, marked with the SP flag = w 1 ", thereby forming 
the "hangover period". The first new set of CN parameters is then passed to the radio transmitter 14 as the seventh 
frame after the end of the speech burst, marked with the SP flag = "0" (see Fig. 14). 

[0115] If, however, at the end of the speech burst, less than 24 frames have elapsed since the last set of CN param- 
40 eters were computed and passed to the radio transmitter 1 4, then the last set of CN parameters are repeatedly passed 
to the radio transmitter 14, until a new updated set of CN parameters is available (seven consecutive frames marked 
with VAD flag = "0"). This reduces the activity on the air interface In cases where short background noise spikes are 
interpreted as speech, by avoiding the "hangover" waiting for the CN parameter computation. Fig. 15 shows as an 
example the longest possible speech burst without hangover. 
45 [0116] Once the first set of CN parameters after the end of a speech burst has been computed and passed to the 
radio transmitter 14, the transmit side DTX handler continuously computes and passes updated sets of CN parameters 
to the radio transmitter 14, marked with the SP flag = "0", so long as the VAD flag = "0". 

[0117] The speech encoder is operated in a normal speech encoding mode if the SP flag = "1" and in a simplified 
mode if the SP flag = "0", because not all encoder functions are required for the evaluation of CN parameters. 

so [0118] In the radio transmitter 14 the following traffic frames are scheduled for transmission: all frames marked with 
the SP flag = "1"; the first frame marked with the SP flag = "0" after one or more frames with the SP flag = "1"; those 
frames marked with SP = n 0" and scheduled for transmission of CN parameter update messages. 
[0119] This has the overall effect of transitioning to the DTX low state after the transmission of a CN parameter 
message when the speaker stops talking. During speech pauses the transmission is resumed at, for example, regular 

55 intervals for transmission of one CN parameter message, in order to update the generated comfort noise on the receive 
side. 

[0120] The comfort noise evaluation algorithm uses the unquantized and quantized (e.g.) Linear Prediction (LP) 
parameters of the speech encoder, using the Line Spectral Pair (LSP) representation, where the unquantized Line 
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AAA A 

Spectral Frequency (LSF) vector is given by f 1 = [f 1 f 2 ... f 10 ] and the quantized LSF vector by f=lfif2~ fid* wftn * 
denoting transpose. The algorithm also uses the LP residual signal r(n) of each subframe for computing the random 
excitation gain and the Random Excitation Spectral Control (RESC) parameters. 

[0121] The algorithm computes the following parameters to assist in comfort noise generation: the reference LSF 
5 parameter vector f& (average of the quantized LSF parameters of the hangover period); the averaged LSF parameter 
vector proa" (average of the LSF parameters of the seven most recent frames); the averaged random excitation gain 
tf**™ (average of the random excitation gain values of the seven most recent frames); the random excitation gain g^,; 
ancFthe RESC parameters A. aa 

[0122] These parameters give information on the spectrum (/ 1 / t / ref t / TTtean r A)Si^ the level (gopd™ 3 ") of the back- 
10 ground noise. cn 

[0123] Three of the evaluated comfort noise parameters (/^^.A, and g mean ) are encoded into a special FACCH 

message, referred to herein as the Comfort Noise (CN) parameter message, for transmission to the receive side. Since 

the reference LSF parameter vector/ ref can be evaluated in the same way in the encoder and decoder, as described 

below, no transmission of this parameter vector is necessary. 
is [0124] The CN parameter message also serves to initiate the comfort noise generation on the receive side, as a CN 

parameter message is always sent at the end of a speech burst, i.e., before the radio transmission is terminated. 

[0125] The scheduling of CN parameter messages or speech frames on the radio path was described above with 

reference to Figs. 7 and 8. 

[0126] The background noise evaluation involves computing three different kinds of averaged parameters: the LSF 
20 parameters, the random excitation gain parameter, and the RESC parameters. The comfort noise parameters to be 
encoded into a Comfort Noise parameter message are calculated over the CN averaging period of N=7 consecutive 
frames marked with VAD = "0", as described in greater detail below. 

[0127] Prior to averaging the LSF parameters over the CN averaging period, a median replacement is performed on 
the set of LSF parameters to be averaged, to remove the parameters which are not characteristic of the background 
25 noise on the transmit side. First, the spectral distances from each of the LSF parameter vectors f(i) to the other LSF 
parameter vectorsf(j), i = 0....6, j = 0....6, fctj, within the CN averaging period are approximated according to the equation: 



30 *R v =f,(M-/ J <l<)f (4) 

Jk=J 



where fj(k) is the kth LSF parameter of the LSF parameter vector f(i) at frame I. 
35 [0128] To find the spectral distance AS ( of the LSF parameter vector f (i) to the LSF parameter vectors f(j) of all other 
frames j= 0...6, j*i, within the CN averaging period, the sum of the spectra! distances AR^ is computed as follows: 



40 



55 



AS, - £ A/?, (5) 



45 foralil=0....6, Mj. 

[0129] The LSF parameter vector f(f) with the smallest spectral distance ASj of all the LSF parameter vectors within 
the CN averaging period is considered as the median LSF parameter vector f^ of the averaging period, and its 
spectral distance is denoted as AS med . The median LSF parameter vector is considered to contain the best, represen- 
tation of the short-term spectral detail of the background noise of all the LSF parameter vectors within the averaging 
50 period. If there are LSF parameter vectors f(|) within the CN averaging period with: 

AS, 

L > ™ med (6) 



ac " ' ' 'med 
A^med 



where TH^ = 2.25 is the median replacement threshold, then at most two of these LSF parameter vectors (the LSF 
parameter vectors causing TH,^ to be exceeded the most) are replaced by the median LSF parameter vector prior 
to computing the averaged LSF parameter vector f™*" 1 . 
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[0130] The set of LSF parameter vectors obtained as a result of the median replacement are denoted as f (n-i), where 
n is the index of the current frame, and i is the averaging period index (i=0....6). 

[0131] When the median replacement is performed at the end of the hangover period (first CN update), all of the 
LSF parameter vectors f(n-i) of the six previous frames (the hangover period, i=1 ....6) have quantized values, while 
the LSF parameter vector f(n) at the most recent frame n has unquantized values. In the subsequent CN update, the 
LSF parameter vectors of the CN averaging period in those frames overlapping with the hangover period have quantized 
values, while the parameter vectors of the more recent frames of the CN averaging period have unquantized values. 
If the period of the seven most recent frames is non-overlapping with the hangover period, the median replacement of 
LSF parameters is performed using only unquantized parameter values. 

[0132] The averaged LSF parameter vector P*""^) at frame n is computed according to the equation: 

1 6 

7 1=0 

where f (n-i) is the LSF parameter vector of one of the seven most recent frames (1=0.... 6) after performing the median 
replacement, i is the averaging period index, and n is the frame index. 

[0133] The averaged LSF parameter vector f™**" (n) at frame n is preferably quantized using the same quantization 
tables that are also used by the speech coder for the quantization of the non-averaged LSF parameter vectors in the 
normal speech encoding mode, but the quantization algorithm is modified in order to support the quantization of comfort 
noise. The LSF prediction residual to be quantized is obtained according to the following equation: 

rW=/ nea >;-/' 8 ' (8) 

where f™ 8 " (n) is the averaged LSF parameter vector at frame n, is the reference LSF parameter vector, r(n) is 
the computed LSF prediction residual vector at frame n, and n is tf^e frame index. 

[0134] Jhe computation of the reference LSF parameter vector f ref is made on the basis of the quantized LSF pa- 
rameters / by averaging these parameters over the hangover period of six frames according to the following equation: 



- 1 A - 

/-7£>IM (9) 



where f(n-i) is the quantized LSF parameter vector of one of the frames of the hangover period (M....6), i is the 
jjangover period frame incjex, and n is the frame index. It should be noted that the quantized LSF parameter vectors 
f(n-f) used for computing p* are not subjected to median replacement prior to averaging. A 
[0135] For each CN generation period the computation of the reference LSF parameter vector is done only once 
at the end of the Jiangover period, and for the rest of the CN generation period / re/ is frozen. The reference LSF 
parameter vector is evaluated in the^ decoder in the same way as in the encoder, because during the hangover 
period the same LSF parameter vectors / are available at the encoder and decoder. An exception to this are the cases 
when transmission errors are severe enough to cause the parameters to become unusable, and a frame substitution 
procedure is activated. In these cases, the modified parameters obtained from the frame substitution procedure are 
used instead of the received parameters. 

[0136] The random excitation gain is computed for each subframe, based on the energy of the LP residual signal of 
the subframe, according to the following equation: 
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gJi) = U86 




JO 



00) 



where g^ 0) is the computed random excitation gain of subframe j, r(l) is the Ith sample of the LP residual of subframe 
j, and I is the sample index (I = 0....39). The scaling factor of 1 .286 is used to make the level of the comfort noise match 
that of the background noise coded by the speech codec. The use of this particular scaling factor value should not be 
read as a limitation of the practice of this invention. 

[0137] The computed energy of the LP residual signal is divided by the value of 1 0 to yield the energy for one random 
excitation pulse, since during comfort noise generation the subframe excitation signal (pseudo noise) has 1 0 non-zero 
samples, whose amplitudes can take values of +1or -1. 

[0138] The computed random excitation gain values are averaged and updated in the first subframe of each frame 
n marked with SP = "0", when an updated set of CN parameters is required, according to the equation: 



where g cn (n)(1) is the computed random excitation gain at the first subframe of frame n, g^, (n-i)(j) is the computed 
random excitation gain at subframe j of one of the past frames (i=1 ....6), and n is the frame index. Since the random 
excitation gain of only the first subframe of the current frame is used in the averaging, it is possible to make the updated 
set of CN parameters available for transmission after the first subframe of the current frame has been processed. 
[0139] The averaged random excitation gain is bounded by f/" 63 " <, 4032.0 and quantized with an 8-bft non-uniform 
algorithmic quantizer in the logarithmic domain, requiring no storage of a quantization table. 

[0140] With regard to the computation of RESC parameters, since the LP residual r(n) deviates somewhat from flat 
spectral characteristics, some loss in comfort noise quality (spectral mismatch between the background noise and the 
comfort noise) will result when a spectrally flat random excitation is used for synthesizing comfort noise on the receive 
side. To provide an improved spectral match, a further second order LP analysis is performed for the LP residual signal 
over the CN averaging period, and the resulting averaged LP coefficients are transmitted to the receive side in the CN 
parameter message to be used in the comfort noise generation. This method is referred to as the random excitation 
spectral control (RESC), and the obtained LP coefficients are referred to as the RESC parameters A. 
[0141] The LP residual signals r(n) of each subframe in a frame are concatenated to compute the autocorrelations 
r res( k )» k=0 - 2 » °f tne LP residual signal of the 20 ms frame according to the equation: 



[0142] After computing the autocorrelations according to the foregoing equation, the autocorrelations are normalized 
to obtain the normalized autocorrelations ^(k). 

[0143] For the most recent frame of the CN averaging period, the autocorrelations from only the first subframe are 
used for averaging to make it possible to prepare the updated set of CN parameters for transmission after the first 
subframe of the current frame has been processed. 

[0144] The computed normalized autocorrelations are averaged and updated in the first subframe of each frame n 
marked with SP = "0", when an updated set of CN parameters is required, according to the equation: 



rM~%r(n)r(n-k) % k = 0. 2 (12) 
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ttto-^r* m fflV+^fs' (13) 



where r^nKI) are the normalized autocorrelations at the first subframe of frame n ( r^ (n-i) are the normalized 
io autocorrelations of one of the past frames (i = 1 ....6), and n is the frame index. 

[0145] The computed averaged autocorrelations r mean are input to a Schur recursion algorithm to compute the two 
first reflection coefficients, i.e., the RESC parameters or X(\) t i = 1 , 2. Each of the two RESC parameters are encoded 
using a 2-bit scalar quantizer. 

[0146] The modification of the speech encoding algorithm during DTX operation is as follows. When the SP flag is 

15 equal to "0" the speech encoding algorithm is modified in the following way. The non-averaged LP parameters which 
are used to derive the filter coefficients of the short-term synthesis filter H(z) of the speech encoder are not quantized, 
and the memory of weighing filter W(z) is not updated, but rather set to zero. The open loop pitch lag search is performed, 
but the closed loop pitch lag search is inactivated and the adaptive codebook gain is set to zero. If the VAD implemen- 
tation does not use the delay parameter of the adaptive codebook for making the VAD decision, the open loop pitch 

20 lag search can also be switched off. No fixed codebook search is performed. In each subframe the fixed codebook 
excitation vector of the normal speech decoder is replaced by a random excitation vector which contains 10 non-zero 
pulses. The random excitation generation algorithm is defined below. The random excitation is filtered by the RESC 
synthesis filter, as described below, to keep the contents of the past excitation buffer as nearly equal as possible in 
both the encoder and the decoder, to enable a fast startup of the adaptive codebook search when the speech activity 

25 begins after the comfort noise generation period. The LP parameter quantization algoritrjni of the speech encoding 
mode is inactivated. At the end of the hangover period the reference LSF parameter vector is calculated as defined 
above. For the remainder of the comfort noise insertion period / re/ is frozen. The averaged LSF parameter vector f mean 
is calculated each time a new set of CN parameters is to be prepared. This parameter vector is encoded into the CN 
parameter message was as defined above. The excitation gain quantization algorithm of the speech encoding mode 

30 is also inactivated. The averaged random excitation gain value ^ nean \s calculated each time a new set of CN param- 
eters is to be prepared. This gain value is encoded into the CN parameter message as previously defined. The com- 
putation of the random excitation gain is performed based on the energy of the LP residual signal, as defined above. 
The predictor memories of the ordinary LP parameter quantization and fixed codebook gain quantization algorithms 
are reset when the SP flag = "0", so that the quantizers start from their initial states when the speech activity begins 

35 again. And finally, the computation of the RESC parameters Is based on the spectral content of the LP residual signal, 
as defined above. The RESC parameters are computed each time a new set of CN parameters is to be prepared. 
[0147] The comfort noise encoding algorithm produces 38 bits for each CN parameter message as shown in Table 
2. These bits are referred to as vector cn[0...37]. The comfort noise bits cn[0...37] are delivered to the FACCH channel 
encoder in the order presented in Table 2 (I.e., no ordering according to the subjective Importance of the bits is per- 

40 formed). 



Table 2 



Detailed bit allocation of comfort noise parameters 


Index (vector to FACCH channel encoder) 


Description 


Parameter 


cn0-cn7 


Index of 1st LSF subvector 


VQ index of r{1...3] 


cn8-cn1 6 


Index of 2nd LSF subvector 


VQ index of r[4...6] 


cn17-cn25 


Index of 3rd LSF subvector 


VQ index of r[7...10] 


cn26-cn33 


Random excitation gain 


Index olf 1 *™ 
en 


cn34-cn35 


Index of 1st RESC parameter 


Index of X(1) 


cn36-cn37 


Index of 2nd RESC parameter 


Index of X(2) 



55 

[0148] Regardless of their context (speech, CN parameter message, other FACCH messages or none), the radio 
receiver of the base station 30 continuously passes the received traffic frames to the receive side DTX handler, indi- 
vidually marked by various preprocessing functions with three flags. These are the speech frame Bad Frame Indicator 



18 



EP 0 843 301 B1 

(BFI) flag, the comfort noise parameter Bad Frame Indicator (BFI_CN) flag, and the Comfort Noise Update Rag (CNU) 
described below and in Table 3. These flags serve to classify the traffic frames according to their purpose. This clas- 
sification, summarized in Table 3, allows the receive side DTX handler to determine in a simple way how the received 
frame is to be processed. 



Table 3 



10 



15 



20 



25 



30 



35 



40 



45 



I Classification of traffic frames 




BFLCN 


BFI 


0 


1 


0 


Invalid Combination 


Good speech frame 


1 


Valid CN parameter message 


Unusable frame 



The binary BFI and BFI_CN flags indicate whether the traffic frame is considered to contain meaningful information 
bits (BFI flag = "O" and BFI_CN flag = "1", or BFI flag = "1" and BFI CN flag = "0 M ) or not (BFI flag = T and BFI_CN 
flag = °1 M , or BFI flag = "0" and BFI_CN flag = "0"). In the context of this disclosure, a FACCH frame is considered not 
to contain meaningful bits unless it contains a CN parameter message, and is thus marked with BFI SP flag = "1" and 
BFI CN flag = °1 M . 

[0149] The binary CNU flag marks with CNU = "1 " those traffic frames that are aligned with the transmission instances 
of the channel quality information sent over the FACCH. 

[0150] The receive side DTX handler Is responsible for the overall DTX operation on the receive side. The DTX 
operation on the receive side is as follows: whenever a good speech frame is detected, the DTX handler passes it 
directly on to the speech decoder; when lost speech frames or lost CN parameter messages are detected, the substi- 
tution and muting procedure is applied; valid CN parameter messages frames result in comfort noise generation until 
the next CN parameter message is expected (CNU = "1°) or good speech frames are detected. During this period, the 
receive side DTX handler ignores any unusable frames delivered by the radio receiver. The following two operations 
are optional: the parameters of the first lost CN parameter message are substituted by the parameters of the last valid 
CN parameter message and the procedure for the CN parameter message is applied; and upon reception of a second 
lost CN parameter message, muting Is applied. 

[0151] With regard to the averaging and decoding of the LP parameters, when speech frames are received by the 
decoderthe LP parameters of the last six speech frames are kept In memory. The decoder counts the number of frames 
elapsed since the last set of CN parameters was updated and passed to the radio transmitter by the encoder. Based 
on this count the decoder determines whether or not there is a hangover period at the end of the speech burst (if at 
least 30 frames have elapsed since the last CN parameter update when the first CN parameter message after a speech 
burst arrives, the hangover period is determined to have existed at the end of the speech burst). 
[0152] As soon as a CN parameter message is received, and the hangover period is detected A at the end of the 
speech burst, the stored LP parameters are averaged to obtain the reference LSF parameter vector/ re/ . The reference 
LSF parameter vector is frozen and used for the actual comfort noise generation period. 
[0153] The averaging procedure for obtaining the reference parameters is as follows: 

[0154] When a speech frame is received, the LSF parameters are decoded and stored in memory. When the first 
CN parameter message is received, and the hangover period is detected at the end of the speech burst, the stored 
LSF parameters are averaged in the same way as in the speech encoder as follows: 



where f(n-i) is the quantized LSF parameter vector of one of the frames of the hangover period (i=t ....6), and n is the 
frame index. 

A 

[0155] Once the reference LSF parameter vector has been computed, the averaged LSF parameter vector f^^fn) 
at frame n (encoded into the CN parameter message) can be reproduced at the decoder each time a CN update 
message is received according to the equation: 
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imean 



(n)=r(nM 



(15) 



A A 

where f 17 ** 3 ^) is the quantized averaged LSF parameter vector at frame n, f mf is the reference LSF parameter vector, 
r(n) is the received quantized LSF prediction residual vector at frame n, and n is the frame index. 
[0156] In each subframe, the fixed codebook excitation vector of the normal speech decoder containing four non- 
zero pulses is replaced during speech inactivity by a random excitation vector which contains 10 non-zero pulses. The 
pulse positions and signs of the random excitation are locally generated using uniformly distributed pseudo-random 
numbers. The excitation pulses take values of +1 and -1 in the random excitation vector. The random excitation gen- 
eration algorithm operates in accordance with the following pseudo-code. 



Pseudo-Code: 

for <i = 0; i < 40; i+ +) 

for O = 0;i < 10;i++) { 

j = random (4); 

idx = j # 10 + i; 

if (random(2) ==1) 

else 

} 



code(i) = 0; 



code(idx) = 1; 
code(idx) = -1; 



where code [0...39] is the fixed codebook excitation buffer, and random (k) generates pseudo-random integer values, 
uniformly distributed over the range [0...k-1). 

[01571 The received RESC parameter indices are decoded to obtain the received RESC parameters M ,2. After 
the random excitation has been generated, it is filtered by the RESC synthesis filter, defined as follows: 



= / 06) 



[0158] The RESC synthesis filter is preferably implemented using a lattice filtering method. After RESC synthesis 
filtering, the random excitation is subjected to scaling and LP synthesis filtering. 

[0159] The comfort noise generation procedure uses the speech decoder algorithm with the following modifications. 
The fixed codebook gain values are replaced by the random excitation gain value received in the CN parameter mes- 
sage, and the fixed codebook excitation is replaced by the locally generated random excitation as was described above. 
The random excitation is filtered by the RESC synthesis filter, as was also described above. The adaptive codebook 
gain value in each subframe is set to 0. The pitch delay value in each subframe is set to, for example, 60. The LP filter 
parameters used are those received in the CN parameter message. The predictor memories of the ordinary LP pa- 
rameter and fixed codebook gain quantization algorithms are reset when the SP flag = "0", so that the quantizers start 
from their initial states when the speech activity begins again. With these parameters, the speech decoder now performs 
its standard operations and synthesizes comfort noise. Updating of the comfort noise parameters (random excitation 
gain, RESC parameters, and LP filter parameters) occurs each time a valid CN parameter message is received, as 
described above. When updating the comfort noise, the foregoing parameters are interpolated over the CN update 
period to obtain smooth transitions. 

[0160] A lost CN parameter message is defined as an unusable frame that is received when the receive side DTX 
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handler is generating comfort noise and a CN parameter message is expected (Comfort Noise Update flag, CNU = "1 "). 
[0161] The parameters of a single lost CN parameter message are substituted by the parameters of the last valid 
CN parameter message and the procedure for valid CN parameters is applied. For the second lost CN parameter 
message, a muting technique is used for the comfort noise that gradually decreases the output level (-3 dB/frame), 
5 resulting in eventual silencing of the output of the decoder The muting is accomplished by decreasing the random 
excitation gain with a constant value of -3 dB in each frame down to a minimum value of 0. This value is maintained if 
additional lost CN parameter messages occur. 

[0162] Although a number of presently preferred embodiments of this invention have been described with respect 
to specific values of frame durations, numbers of frames, specific message types (e.g., FACCH) and the like, it should 

w be realized that the numbers of frames, duration of frames, duration of the hangover period, duration of the averaging 
period, message types, etc., may be varied in accordance with the specifications and requirements of different types 
of digital mobile communications systems. Furthermore, and although the invention has been described in the context 
of circuit block diagrams, such as those shown in Figs. 2a, 2b, 3a, 3b, 4, 5, and 10, it will be appreciated that some of 
the illustrated circuit blocks are implemented by a suitably programmed digital data processor (e.g., the controller 18 

15 of Fig. 1 2) that forms a portion of the digital cellular telephone 1 0. By example only, the selectors 307, 31 9 and 41 0 of 
Figs. 4 and 5, although shown as switches, may be implemented wholly in software. 

[0163] Also, it is noted that there are Comfort Noise generation schemes in some systems where spare bits are not 
available in the CN parameter message (or SID frame) for transmitting the RESC parameters from the transmit side 
to the receive side. In those cases, the RESC filter could be replaced by a synthesis filter with fixed coefficients. The 

20 fixed filter coefficients are then optimized to cause the frequency response of the synthesis filter to have an average 
response of the normal RESC filter with transmitted coefficients. The filter coefficients could be also selected to give 
a filter response which provides a perceptually (subjectively) preferred quality of comfort noise. 
[0164] Thus, while the invention has been particularly shown and described with respect to preferred embodiments 
thereof, It will be understood by those skilled in the art that changes in form and details may be made therein without 

25 departing from the scope of the invention as defined by the appended claims. 

Claims 

30 1 . A method for generating comfort noise (CN) in a digital mobile terminal that uses a discontinuous transmission, 
comprising the steps of: 

in response to a speech pause, buffering a set of speech coding parameters; 

35 within an averaging period, replacing speech coding parameters of the set that are not representative of back- 

ground noise with speech coding parameters that are representative of the background noise; and 

averaging the set of speech coding parameters. 

40 2. A method as In claim 1 , wherein the step of replacing includes the steps of: 

measuring distances of the speech coding parameters from one another between individual frames within the 
averaging period; 

45 identifying those speech coding parameters which have the largest distances to the other parameters within 

the averaging period; and 

if the distances exceed a predetermined threshold, replacing an identified speech coding parameter with a 
speech coding parameter which has a smallest measured distance to the other speech coding parameters 
50 within the averaging period. 

3. A method as in claim 1 , wherein the step of replacing includes the steps of: 

measuring distances of the speech coding parameters from one another between individual frames within the 
55 averaging period; 

identifying those speech coding parameters which have the largest distances to the other parameters within 
the averaging period; and 
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if the distances exceed a predetermined threshold, replacing an identified speech coding parameter with a 
speech coding parameter having a median value. 

4. A method as in claim 1 , wherein the step of averaging includes a step of computing an average excitation gain 
5 9mean and average short term spectral coefficients f mean (i). 

5. A method as in claim 1 , wherein the step of replacing includes steps of: 

forming a set of buffered excitation gain values over the averaging period; 

10 

ordering the set of buffered excitation gain values; and 

performing a median replacement operation in which those L excitation gain values differing the most from 
the median value, where the difference exceeds a predetermined threshold value, are replaced by the median 
is value of the set. 

6. A method as in claim 5, wherein a length N of the averaging period is an odd number, and wherein the median of 
the ordered set is the ((N+1 )/2)th element of the set. 

20 7. a method as in claim 1 , and further comprising a step of: 

forming a set of buffered Line Spectral Pair (LSP) coefficients f(k), k=1,...,M over the averaging period; and 

determining a spectral distance of the LSP coefficients tyk) of the ith frame in the averaging period, to the LSP 
25 coefficients fj(k) ofthe jth frame in the averaging period. 

8. A method as in claim 7, where the step of determining the spectral distance is accomplished in accordance with 
the expression 

30 
35 

where M is the degree ofthe LPC model, and f f (k) Is the kth LSP parameter of the ith frame in the averaging period. 

9. A method as in claim 7, and further comprising a step of determining the spectral distance ASj ofthe LSP coefficients 
f;(k) of frame I to the LSP coefficients of all the other frames j=1 ,...,N, i*j, within the averaging period of length N. 

40 

10. A method as in claim 9, wherein the step of determining the spectral distance is accomplished by determining the 
sum of the spectral distances ARy in accordance with 

45 
50 



for all i=1,...,N. 

11. A method as in claim 9, and further comprising steps of: 

after the spectral distances ASj have been found for each of the LSP vectors. f f within the averaging period, 
ordering the spectral distances according to their values; 
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considering a vector f f with the smallest distance AS, within the averaging period i=1 , 2,...,N to be a median 
vector f med of the averaging period having a distance denoted as AS med ; and 

performing a median replacement of P (0<P<N-1) LSP vectors f ( with the median vector f med . 

12. A method as in claim 2, wherein the steps of identifying and replacing are performed independently for excitation 
gain values g and Line Spectral Pair (LSP) vectors f,. 

13. A method as in claim 2, wherein the steps of identifying and replacing are combined together for excitation gain 
values g and Line Spectral Pair (LSP) vectors f,. 

14. A method as in claim 13,comprising steps of: 

in response to determining that the speech coding parameters in an individual frame are to be replaced by 
median values of the parameters, replacing both the excitation gain value g and the LSP vector f, of that frame 
by the respective parameters of the frame containing the median parameters. 

15. A method as in claim 14, and comprising initial steps of: 

determining a distance ATy between the parameters of the ith fran and the jth frame of the averaging period 
in accordance with the expression 



1-5/ 



where M is the degree of the LPC model, f s (k) is the kth LSP parameter of the ith frame of the averaging period, 
and g, is the excitation gain parameter of the ith frame. 

16. A method as in claim 15, and further comprising a step of: 

determining a distance ASj of the speech coding parameters of frame i, for all i=1 ,...,N, to the speech coding 
parameters of all the other f rames J=1,...,N, i*j within the averaging period of length N, in accordance with 



foralll=1,..,N. 

17. A method as in claim 1 6, wherein after the distances ASj have been determined for each of the frames within the 
averaging period, further comprising steps of: 

ordering the distances according to their values; and 

considering a frame with the smallest distance AS } within the averaging period i=1 ,2,...,N as a median frame, 
having distance AS medt of the averaging period, the median frame having speech coder parameters g^ and 

fmed- 

18. A method as in claim 1 7, and comprising a step of performing median replacement on the speech coding parameter 
frames within the averaging period M ,2,...,N wherein parameters gj and fj of L (0<L<N-1 ) frames are replaced by 
the parameters g med and f^ of the median frame. 

19. A method as in claim 17, wherein differences between each individual distance and the median distance are de- 
termined by dividing an individual distance by the median distance in accordance with AS/AS med . 
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20. A method as in claim 11 , wherein differences between each individual distance and the median distance are de- 
termined by dividing an individual distance by the median distance in accordance with AS/AS mGd . 

21. Apparatus for generating comfort noise (CN) in a system having a digital mobile terminal that uses a discontinuous 
transmission to a network, comprising: 

data processing means in said digital mobile terminal that is responsive to a speech pause for buffering a set 
of speech coding parameters and, within an averaging period, for replacing speech coding parameters of the 
set that are not representative of background noise with speech coding parameters that are representative of 
the background noise, said data processing means averaging the set of speech coding parameters and trans- 
mitting the averaged set of speech coding parameters to the network. 

22. Apparatus as in claim 21 , wherein said data processor replaces speech coding parameters of the set by ordering 
the set and measuring distances of the speech coding parameters from one another between individual frames 
within the averaging period, by identifying those speech coding parameters which have the largest distances to 
the other parameters within the averaging period; and, if the distances exceed a predetermined threshold, by 
replacing the identified speech coding parameters with a speech coding parameter which has a smallest measured 
distance to the other speech coding parameters within the averaging period. 

23. Apparatus as in claim 21 , wherein said data processor replaces speech coding parameters of the set by ordering 
the set and measuring distances of the speech coding parameters from one another between individual frames 
within the averaging period; by identifying those speech coding parameters which have the largest distances to 
the other parameters within the averaging period; and, if the distances exceed a predetermined threshold, by 
replacing an identified speech coding parameter with a speech coding parameter having a median value. 

24. Apparatus as in claim 21 , wherein said data processing means identifies and replaces speech coding parameters 
independently for excitation gain values g and Line Spectral Pair (LSP) vector fj. 

25. Apparatus as in claim 21 , wherein said data processing means identifies and replaces speech coding parameters 
together for excitation gain values g and Line Spectral Pair (LSP) vector t v 



Patentanspruche 

1 . Verfahren zum Erzeugen von Hintergrund-Beruhigungsrauschen (CN) in einem digitalen Mobittermlnal unter Ver- 
wendung diskontinuierlicher Ubertragung, mit den folgenden Schritten: 

Puffern, in Reaktion auf eine Sprechpause, eines Satzes von Sprachcodierparametem; 
Ersetzen, innerhalb einer Mittelungsperiode, von Sprachcodierparametem des Satzes, die nicht fur Hinter- 
grundrauschen reprasentativ sind, durch Sprachcodierparameter, die fur Hintergrundrauschen reprasentativ 
sind; und 

Mitteln des Satzes von Sprachcodierparametem. 

2. Verfahren nach Anspruch 1 , be! dem der Ersetzungsschritt die folgenden Schritte beinhaltet: 

Messen von Abstanden der Sprachcodierparameter voneinander zwischen einzelnen Rahmen innerhalb der 
Mittelungsperiode; 

Identifizieren derjenigen Sprachcodierparameter mit den groftten Abstanden zu den anderen Parametern in- 
nerhalb der Mittelungsperiode; und 

Ersetzen, wenn die Abstande einen vorbestimmten Schwellenwert Qberschreiten, eines identifizierten Sprach- 
codierparameters durch denjenigen mit dem kleinsten gemessenen Abstand zu den anderen Sprachcodier- 
parametem innerhalb der Mittelungsperiode. 

3. Verfahren nach Anspruch 1 , bei dem der Ersetzungsschritt die folgenden Schritte beinhaltet: 

Messen von Abstanden der Sprachcodierparameter voneinander zwischen einzelnen Rahmen innerhalb der 
Mittelungsperiode; 

Identifizieren derjenigen Sprachcodierparameter mit den groBten Abstanden zu den anderen Parametern in- 
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nerhalb der Mittelungsperiode; und 

Ersetzen, wenn die Abstande einen vorbestimmten Schwellenwert uberschreiten, eines identifizierten Sprach- 
codierparameters durch einen Sprachcodierparameter durch den Medianswert. 

4. Verfahren nach Anspruch 1 , bei dem der Mittelungsschritt einen Schritt des Berechnens einer mittleren Erregungs- 
verstarkung g^^ und mittlerer Kurzzeit-Spektralkoeffizienten f^a^i) beinhaltet. 

5. Verfahren nach Anspruch 1 , bei dem der Ersetzungsschritt die folgenden Schritte beinhaltet: 

Erzeugen eines Satzes gepufferter Erregungsverstarkungswerte uber die Mittelungsperiode; 
Ordnen des Satzes gepufferter Erregungsverstarkungswerte; und 

Ausfuhren einer Mediansersetzungsoperation, bei der diejenigen L Erregungsverstarkungswerte, die sich am 
meisten vom Medianswert unterscheiden, wobei die Differenz einen vorbestimmten Schwellenwert uberschrei- 
tet, durch den Medianswert des Satzes ersetzt werden. 

6. Verfahren nach Anspruch 5, bei dem die Lange N der Mittelungsperiode eine ungerade Zahl ist und der Median 
des geordneten Satzes das Element ((N+1 )/2) des Satzes ist. 

7. Verfahren nach Anspruch 1 , ferner mit dem folgenden Schritt: 

Erzeugen eines Satzes gepufferter LSP(Line Spectral Pair)-Koeffizienten f (k), k = 1 , .... M uber die Mittelungs- 
periode; und 

Bestimmen des spektralen Abstands der LSP-Koeffizienten fj(k) des Rahmens i in der Mittelungsperiode zu 
den LSP-Koeffizienten fj(k) des Rahmens j in der Mittelungsperiode. 

8. Verfahren nach Anspruch 7, bei dem der Schritt des Bestimmens des spektralen Abstands gemaR der folgenden 
Gleichung ausgefuhrt wird: 



M 
k=l 



wobei M derGrad des LPC-Modells ist undfj(k) derk-te LSP- Parameter des Rahmens I in der Mittelungsperiode ist. 

9. Verfahren nach Anspruch 7, femer mit dem Schritt des Bestimmens des spektralen Abstands AS ( der LSP-Koeffi- 
zienten fj(k) des Rahmens i zu den LSP-Koefflzienten alter anderen Rahmen J=1, .... N, ktj, innertialb der Mitte- 
lungsperiode der Ldnge N. 

10. Verfahren nach Anspruch 9, bei dem der Schritt des Bestimmens des spektralen Abstands dadurch ausgefuhrt 
wird, dass die Sum me der spektralen Abstdnde ARy gemaB 

M 

AS 4 - S AR^-i 
3 = 1, ]A 



furalle 1=1 ,...,N erfolgt. 

11. Verfahren nach Anspruch 9, femer mit dem folgenden Schritt: 

Ordnen, nachdem die spektralen Abstande ASj fur jeden der LSP-Vektoren f( innerhalb der Mittelungsperiode 
aufgefunden wurden, derselben entsprechend ihren Werten; 
- Verwenden des Vektors f, mit dem kleinsten Abstand AS ( innerhalb der Mittelungsperiode i=1, 2, N als 
Mediansvektor f med der Mittelungsperiode mit einem als AS med bezeichneten Abstand; und 
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10 



15 



20 



25 



Ausfuhren einer Mediansersetzung von P (0 £ P < N1 -1 ) LSP-Vektoren fj durch den Mediansvektor f med . 

12. Verfahren nach Anspruch 2, bei dem die Schritte des Identrfizierens unci Ersetzens fur Erregungsverstarkungs- 
werte g und LSP(Line Spectral Pair)-Vektoren \ unabhangig ausgefuhrt werden. 

13. Verfahren nach Anspruch 2, bei dem die Schritte des Identrfizierens und Ersetzens fur Erregungsverstarkungs- 
werte g und LSP(Llne Spectral Pair)-Vektorf ( miteinander kombiniert werden. 

14. Verfahren nach Anspruch 13, mil dem folgenden Schritt: 

Ersetzen, in Reaktion auf die Ermittlung, dass die Sprachcodierparameter in einem individuellen Rahmen 
durch Medianswerte der Parameter zu ersetzen sind, sowohl des Erregungsverstarkungswerts g als auch des 
LSP-Vektors f t dieses Rahmens durch die jeweiligen Parameter des Rahmens, der die Mediansparameter 
enthalt. 

15. Verfahren nach Anspruch 14, mit den folgenden Anfangsschritten: 

Bestimmen des Abstands ATj| zwischen den Parametern des Rahmens i und des Rahmens j der Mittelungs- 
periode gemaB dem folgenden Ausdruck 



M 

ATij - E fi(k) - fj(i) 2 + v( gi -gj)2, 



wobei M der Grad des LPC-Modells ist, fj(k) der k-te LSP- Parameter des Rahmens i der Mittelungsperiode ist und 
gj der Erregungsverst&rkungswert des Rahmens i ist. 

30 

16. Verfahren nach Anspruch 15, femer mit dem folgenden Schritt: 

Bestimmen des Abstands ASj der Sprachcodierparameter des Rahmens i fOr alle 1-1, N mit den Sprach- 
codierparametern aller anderen Rahmen j=1 , N, fctj, innerhalb der Mittelungsperiode der Lange N gemaB 

35 

N 

ASi = s AT ij 

40 

fur alle M N. 

17. Verfahren nach Anspruch 16, femer mit den folgenden Schritten nach dem Bestimmen der Abstfinde ASj fur jeden 
45 der Rahmen Innerhalb der Mittelungsperiode: 

Ordnen der Abstande entsprechend ihren Werten und 

Verwenden des Rahmens mit dem kleinsten Abstand AS; innerhalb der Mittelungsperiode 1=1 , 2, N als 
Mediansrahmen, mit dem Abstand AS med der Mittelungsperiode, wobei der Mediansrahmen uber Sprach- 
so codlerparameter g,,,^ und f med verfugt. 

18. Verfahren nach Anspruch 17, mit dem Schritt des Ausfuhrens einer Mediansersetzung an den Sprachcodierpara- 
meter- Rahmen innerhalb der Mittelungsperiode 1=1 p 2 N t wobei die Parameter gj und fj von L(0 < L < N-1) 

Rahmen durch die Parameter g med und f med des Mediansrahmens ersetzt werden. 

55 

19. Verfahren nach Anspruch 17, bei dem die Abstande zwischen jedem individuellen Abstand und dem Median sab- 
stand dadurch bestimmt werden, dass ein individueller Abstand gemaB AS/AS med durch den Mediansabstand 
geteilt wird. 
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20. Verfahren nach Anspruch 11 , bei dem die Abstande zwischen jedem individuellen Abstand und dem Mediansab- 
stand dadurch bestimmt werden, dass ein individuelfer Abstand gemaB AS/AS med durch den Mediansabstand 
geteilt wird. 

21. Vorrichtung zum Erzeugen von Hintergrund-Beruhigungsrauschen (CN) in einem System mit einem digitalen Mo- 
bilterminal unter Verwendung diskontinuierlicher Obertragung an ein Netzwerk, mit: 

einer Datenverarbeitungseinrichtung im digitalen Mobilterminal. die auf eine Sprechpause reagiert, urn einen 
Satz von Sprachcodierparametern zu puff em und um, innerhalb einer Mittelungsperiode, Sprachcodierpara- 
meter des Satzes, die nicht fur Hintergrundrauschen reprasentativ sind, durch Sprachcodierparameter zu er- 
setzen, die fur Hintergrundrauschen reprasentativ sind, wobei die Datenverarbeitungseinrichtung den Satz 
von Sprachcodierparametern mittelt und den gemittelten Satz von Sprachcodierparametern an das Netzwerk 
ubertragt. 

22. Vorrichtung nach Anspruch 21 , bei der der Datenprozessor Sprachcodierparameter des Satzes durch Ordnen des 
Satzes und durch Messen der Abstande der Sprachcodierparameter von einander zwischen individuellen Rahmen 
innerhalb der Mittelungsperiode ersetzt, wobei diejenigen Sprachcodierparameter identifiziert werden, die inner- 
halb der Mittelungsperiode die grdGten Abstande zu den anderen Parametern zeigen; und wobei, wenn die Ab- 
stande einen vorbestimmten Schwellenwert uberschreiten, die identifizierten Sprachcodierparameter durch den- 
jenigen Sprachcodierparameter ersetzt werden, der innerhalb der Mittelungsperiode den kleinsten gemessenen 
Abstand zu den anderen Sprachcodierparametern aufweist. 

23. Vorrichtung nach Anspruch 21 , bei der der Datenprozessor Sprachcodierparameter des Satzes dadurch ersetzt, 
dass er den Satz ordnet und Abstande der Sprachcodierparameter voneinander zwischen individuellen Rahmen 
innerhalb der Mittelungsperiode misst; wobei diejenigen Sprachcodierparameter identifiziert werden, die innerhalb 
der Mittelungsperiode die groBten Abstande zu den anderen Parametern aufweisen; und wobei, wenn die Abstan- 
de einen vorbestimmten Schwellenwert uberschreiten, ein identifizierter Sprachcodierparameter durch einen sol- 
chen mit einem Medianswert ersetzt wird. 

24. Vorrichtung nach Anspruch 21 , bei der die Datenverarbeitungseinrichtung Sprachcodierparameter fur Erregungs- 
verstfirkungswerte g und einen LSP(Line Spectral Pair)-Vektor fj unabhangig identifiziert und ersetzt. 

25. Vorrichtung nach Anspruch 21 , bei der die Datenverarbeitungseinrichtung Sprachcodierparameter f Or Erregungs- 
verstSrkungswerte g und einen LSP(Line Spectral Pair)-Vektor f, gemeinsam identifiziert und ersetzt. 



R even d I cat Ions 

1. Precede destine a generer du bruit de confort (CN) dans un terminal mobile numerique qui utilise une emission 
discontinue, comprenant les etapes consistent a : 

en reponse a une pause de la parole, mettre en tampon un ensemble de parametre de codage vocal, 
a I'interieur d'une peYiode de calcut de moyenne, remplacer les parametres de codage vocal de I'ensemble 
qui ne sont pas representatifs du bruit de fond par des parametres de codage vocal qui sont representatifs 
du bruit de fond, et 

moyenner I'ensemble des parametres de codage vocal. 

2. Procedd selon la revendication 1 , dans lequel I'etape de remplacement comprend les etapes consistant a : 

me surer des distances des parametres de codage vocal les uns par rapport aux autres entre des trames 
indrvlduelles a I'interieur de la periode de calcul de moyenne, 

identifier les parametres de codage vocal qui presentent les distances les plus grandes par rapport aux autres 
parametres a I'interieur de la periode de calcul de moyenne, et 

si les distances depassent un seuil predetermine, remplacer un parametre de codage vocal identifie par un 
parametre de codage vocal qui presente une distance mesuree la plus petite aux autres parametres de codage 
vocal a I'interieur de la periode de calcul de moyenne. 

3. Procede selon la revendication 1 , dans lequel i'etape de remplacement comprend les etapes consistant a : 
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mesurer des distances des parametres de codage vocal les uns par rapport aux autres entre des trames 
individuelles a Pinterieur de la period e de calcul de moyenne, 

identifier les parametres de codage vocal qui presentent les distances les plus grandes aux autres parametres 
a Pinterieur de la periode de calcul de moyenne, et 

si les distances depassent un seuil predetermine, remplacer un parametre de codage vocal identifie par un 
parametre de codage vocal ayant une vateur de mediane. 

4. Precede selon la revendication 1 , dans lequel I'etape de calcul de moyenne comprend une etape consistant a 
calculer un gain d'excitation moyen g moy et des coefficients moyens spectraux a court terme f^yO). 

5. Precede selon la revendication 1 , dans lequel I'etape de remplacement comprend les etapes consistant a : 

former un ensemble de valeurs de gain d'excitation mises en tampon sur la periode de calcul de moyenne, 
ordonner Pensemble des valeurs de gain d'excitation mises en tampon, et 

executer une operation de remplacement par la mediane dans laquelle les L valeurs de gain d'excitation 
different le plus de la valeur de la mediane, oil la difference depasse une valeur de seuil predeterminee, sont 
remplacees par la valeur de la mediane de ['ensemble. 

6. Precede selon la revendication 5, dans lequel une longueur N de la penode de catcul de moyenne est un nombre 
impair, et dans lequel la mediane de ('ensemble ordonne est le ((N + 1 )/2) e element de I'ensemble. 

7. Precede selon la revendication 1 , et comprenant en outre une etape consistant a : 

former un ensemble de coefficient f(k) de paires spectrales de raies (LSP) mis en tampon, k = 1 , .... M sur la 
periode de calcul de moyenne, et 

determiner une distance spectrale des coefficients LSP f s (k) de la i e trame dans la periode de calcul de moyen- 
ne, aux coefficients LSP fj(k) de la f trame dans la periode de calcul de moyenne. 

8. Precede selon la revendication 7, ou I'etape consistant a determiner la distance spectrale est accomplie confor- 
mement a ('expression 



ou M est le degre du modele LPC, et f s (k) est le V? parametre LSP de la i e trame dans la periode de calcul de 
moyenne. 

9. Procede selon la revendication 7, et comprenant en outre une etape consistant a determiner la distance spectrale 
AS, des coefficients LSP fj(k) de la trame I aux coefficients LSP de toutes les autres trames j = 1 , .... N, I * J, a 
Pinterieur de la periode de calcul de moyenne de longueur N. 

10. Procede selon la revendication 9, dans lequel I'etape consistant a determiner la distance spectrale est accomplie 
en determinant la somme des distances spectrales AFty conformement a 

N 

ASi - ' 

pour tout i= 1, N. 

11. Procede selon la revendication 9, et comprenant en outre les etapes consistant a : 

apres que les distances spectrales AS, ont ete trouvees pour chacun des vecteurs LSP fj a Pinterieur de la 
periode de calcul de moyenne, ordonner les distances spectrales conformement a leurs valeurs, 
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considerer un vecteur f ( presentant la plus petite distance AS { a I'interieur de la periode de calcut de moyenne 
i = 1 , 2, 3, N comme etant un vecteur de mediane f med de la periode de calcul de moyenne presentant une 
distance notee AS med , et 

executer un remplacement par une mediane de P (0 £ P < N-1 ) vecteurs LSP f, par le vecteur de mediane f med . 

5 

12. Procede selon la revendication 2, dans lequel les etapes d'identification et de remplacement sont executees in- 
dependamment pour des valeurs de gain d'excitation g et des vecteurs f t de paires spect rales de raies (LSP). 

13. Procede selon la revendication 2, dans lequel les etapes consistant a identifier et remplacer sont combinees en- 
io semble pour des valeurs de gain d'excitation g et des vecteurs f ( de paires spectral es de raies (LSP). 

14. Procede selon la revendication 13, comprenant les etape consistant a : 

en reponse a la determination de ce que les parametres d codage vocal dans une trame individuelle doivent 
15 etre remplace par les valeurs de medianes des parametres, remplacer a la foi la valeur de gain d'excitation j 

et le vecteur LSP f ( de cett trame par les parametres respectifs de la trame contenant le parametres de me- 
dianes. 

15. Procede selon la revendication 14, et comprenant le etapes initiales consistant a : 

20 

determiner une distance ATg entre les parametres de la i e trame et de la j e trame de la periode de calcul de 
moyenne conformement a I'expression 

AT 8 = X(f,(k) - f^k))' +w( gj - gj )' , 

k-l 

ou M est le degr6 du modele LPC, f s (k) est le k 6 parametre LSP de la l e trame de la periode de calcul de moyenne, 
30 g. est le parametre de gain d'excitation de la f> trame. 

16. Precede selon la revendication 15, et comprenant en outre une etape consistant a : 

determiner une distance AS, des parametres de codage vocal de la trame I, pour tout i - 1 , N, aux parametres 
35 de codage vocal de toutes les autres trames j = 1,...,N,i*ja I'interieur de la periode de calcul de moyenne 

de longueur N conformement a 



N 

pour tout I = 1, .... N. 

45 17. Procede selon la revendication 16, dans lequel apres que les distances ASj ont ete definies pour chacune des 
trames a I'interieur de la periode de calcul de moyenne, II comprend en outre les etapes consistant a : 

ordonner les distances en fonction de leurs valeurs, et 

considgrer une trame ayant la distance la plus petite AS, a I'interieur de la penode de calcul de moyenne i = 
50 1 , 2, N comme etant une trame de mediane, ayant une distance AS med , de la periode de calcul de moyenne, 

la trame de mediane ayant des parametres de codage vocal g,^ et f med . 

18. Procede selon la revendication 19, et comprenant une etape consistant a executer un remplacement par la me- 
diane des trames de parametres de codage vocal a I'interieur de la periode de calcul de moyenne i = 1, 2, .... N, 

55 dans lequel les parametres gj et f, de L(0 < L < N-1 ) trames sont remplaces par les parametres g med et f^ de la 

trame de m6diane. 

19. Procede selon la revendication 1 7, dans lequel des differences entre chaque distance individuelle et la distance 
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de mediane sont determinees en divisant une distance individuelle par la distance de mediane conformement a 
AS/AS med . 

20. Precede selon la revendication 11 ; dans lequel les differences entre chaque distance individuelle et ia distance 
5 de mediane sont determinees en divisant une distance individuelle par la distance de mediane conformement a 

AS/AS med . 

21. Dispositif destine a generer un bruit de contort (CN) dans un systeme comportant un terminal mobile numerique 
qui utilise une emission discontinue vers un reseau, comprenant : 

w 

un moyen de traitement de donnees dans ledit terminal mobile numerique qui est sensible a une pause de la 
parole pour mettre en memoire tampon un ensemble de parametres de codage vocal et, a I'interieur d'une 
periode de calcul de moyenne, pour remplacer des parametres de codage vocal de I'ensemble qui ne sont 
pas representatifs du bruit de fond, par des parametres de codage vocal qui sont representatifs du bruit de 
15 fond, ledit moyen de traitement de donnees calculant ta moyenne de I'ensemble des parametres de codage 

vocal et emettant I'ensemble moyenne de parametres de codage vocal sur le reseau. 

22. Dispositif selon la revendication 21 , dans lequel ledit processeur de donnees remplace des parametres de codage 
vocal de i'ensemble en ordonnant I'ensemble et en mesurant des distances des parametres de codage vocal les 

20 uns par rapport aux autres entre des trames individuelles a I'interieur de la periode de calcul de moyenne, en 

identif iant les parametres de codage vocal qui ont les plus grandes distances par rapport aux autres parametres 
a I'interieur de la periode de calcul de moyenne, et, si les distances depassent un seuil predetermine, en remplacant 
les parametres de codage vocal identifies par un parametre de codage vocal qui presente la distance mesuree la 
plus petite aux autres parametres de codage vocal & I'interieur de la periode de calcul de moyenne. 

25 

23. Dispositif selon la revendication 21 , dans lequel ledit processeur de donnees remplace des parametres de codage 
vocal de I'ensemble en ordonnant I'ensemble et en mesurant des distances des parametres de codage vocal les 
uns par rapport aux autres entre des trames individuelles a I'interieur de la periode de calcul de moyenne, en 
identif iant les parametres de codage vocal qui p res en tent les distances les plus grandes par rapport aux autres 

30 parametres a I'interieur de la periode de calcul de moyenne, et, si la distance depasse un seuil predetermine, en 

remplacant un parametre de codage vocal identif ie parun param etrede codage vocal ayant une valeurde mediane. 

24. Dispositif selon la revendication 21 , dans lequel ledit moyen de traitement de donnees identifie et remplace des 
parametres de codage vocal independamment pour des valeurs de gain d'excitation g et un vecteur fj de paire 

55 spectrale de raies (LSP). 

25. Dispositif selon la revendication 21 , dans lequel ledit moyen de traitement de donnees identifie et remplace en- 
semble des parametres de codage vocal pour des valeurs de gain d'excitation g et un vecteur f { de paire spectrale 
de rales (LSP). 
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